Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

7 3D Shape Matching for Retrieval and Recognition

269

7.2 Literature Review

To support similarity search between two multimedia objects, we must define a model that allows us to compare them. It cannot be done directly because we are interested in retrieving similar objects stored in a database, rather than extracting identical copies. A lot of work has been done to assess the visual similarity between 3D objects when the required model is global. By global, we mean that given a 3D object, an algorithm retrieves those objects in database that look visually similar and the whole shape structure is used for the comparison. The most widely used model consists of obtaining an abstract representation of an object such as feature vectors or graphs, and the similarity is calculated based upon these representations. For example, distance metrics are commonly used to calculate similarity among feature vectors. Another similarity model which has gained much interest involves local features, especially in non-rigid and partial shape retrieval and recognition. The problem of defining a similarity model is very challenging. For example, a visual model allows us to represent a shape by its aspect. However, this model cannot be used to discriminate two shapes semantically similar, but with different aspect.

Bustos et al. [26] described a methodology that consists of representing 3D objects as real vectors of a certain dimension obtained through a transformation function. Then, these vectors can be organized in a multidimensional index, where the similarity corresponds to the proximity in the space where vectors are defined. The Minkowski distance family is usually used to measure the proximity of feature vectors. The authors presented experimental results comparing several transformation functions where the depth-buffer descriptor proposed by Vranic [106] showed the best results. Iyer et al. [55] and Tangelder et al. [100] also discussed techniques for 3D object retrieval, identifying future trends and important issues to be addressed. Also, the Princeton Shape Benchmark [96] provides a reference collection and experimental tests for several descriptors. Other surveys and reports were written by Funkhouser et al. [42], Del Bimbo et al. [15] and Bustos et al. [25, 27]. In 3D shape recognition, Campbell and Flynn [28] presented a comprehensive study of representation and recognition of free-form shapes.

Since then, many other techniques have been proposed to improve effectiveness and efficiency. A brief summary of representative approaches for shape retrieval is given in Table 7.1. Due to space limitations, this table does not show all techniques presented so far. We recommend reading Sect. 7.6 for further references.

Table 7.1 shows a simple taxonomy depending of the extracted shape information which is used in matching. Histogram-based methods summarize certain shape properties such as distribution between points on the surface [82], angle information [86], distribution of face normals [59], and so forth. These properties are used to build histograms which represent shapes and matching is done with common histogram measures. In contrast, transform-based methods apply some transform to the shapes to convert them to a numerical representation of the underlying object. Some examples of such transforms are: Fourier Transform [39], Radon Transform [36], and Wavelet Transform [67]. Image-based methods represent a 3D object as a set of projected images, so the matching becomes an image matching problem. For

270

 

Table 7.1 3D shape retrieval

Type

methods

 

 

Histogram-based

B. Bustos and I. Sipiran

Method

Shape distributions [82] Generalized shape distributions [75] Angle histogram [86]

Extended Gaussian images [59]

Transform-based

3D Fourier [39]

 

Angular radial transform [91]

 

Spherical trace transform [114]

 

Rotation invariant spherical harmonics [60]

 

Concrete radialized spherical projection [84]

 

Spherical wavelet descriptor [67]

Image-based

Depth-buffer descriptor [106]

 

Silhouette descriptor [106]

 

Light-field descriptor [30]

 

Depth-Line descriptor [29]

Graph-based

Reeb graph [104]

 

Skeleton graph [99]

Local features

Salient geometric features [44]

 

Salient spectral features [52]

 

Segmentation-based visual vocabulary [103]

 

Heat kernel signatures [83]

instance, Vranic [106] proposed to take the frequency spectrum of depth images, Chen [30] considered silhouettes taken from directions according to the vertices of a dodecahedron, and Chaouch and Verroust-Blondet [29] converted a set of depth images in character strings with the matching being performed with variations of the well-known edit distance. Graph-based methods represent shapes by graph structures such as Reeb graphs [51] which contain information about the connectivity of the shape’s parts.

There are several drawbacks with the aforementioned techniques. On the one hand, many of them (especially, image-based and transform-based methods) are pose sensitive. That is, one needs to apply a pose normalization step before the feature extraction process. Clearly, partial and non-rigid matching cannot be addressed with these methods. On the other hand, graph-based methods rely on the topological properties of a 3D object, so topological changes affect the description processes.

Recently, approaches based on local descriptors have received special attention due to its ability to support non-rigid and partial matching. In these approaches, each shape is represented as a set of descriptors and the matching is performed by searching for the best correspondence between them. Gal and Cohen-Or [44]

7 3D Shape Matching for Retrieval and Recognition

271

proposed to represent a 3D object as a set of salient geometric features, which determine the complete object. Their scheme entirely relies on curvature information over the shape’s surface and the matching is done by indexing the salient features using geometric hashing [108] with a vote scheme to determine similar objects.

An interesting approach was given by Hu and Hua [52] to address the non-rigid and partial matching problem. Their method consists in using the Laplace-Beltrami operator to detect and describe interest points in 3D objects. This operator captures the information of a mesh in different scales, and it is also an isometric invariant, which is an important property to support geometric invariance. The authors proposed an energy function based on the Laplace-Beltrami spectrum to detect interest points with its associated scale. Using these points, along with their respective scales, it is possible to extract descriptors for each interest point from the local Laplace-Beltrami spectrum. The matching is performed by solving an integer quadratic programming problem where two sets of features belonging to shapes to be matched are involved.

One of the most widely used approaches for managing local descriptors is the bag-of-features approach. This begins by clustering all the local descriptors from an entire collection and calculating the centroids for each cluster. Then, each shape is represented as a histogram with a number of bins equal to the number of clusters. Each descriptor adds one into the bin corresponding to the closest centroid. Toldo et al. [103] proposed to segment a given mesh and build a descriptor for each segment. Subsequently, a bag-of-features approach combines all the descriptors in the mesh. Similarly, Ovsjanikov et al. [83] proposed a soft version of the Bag-of-Features approach applied to dense descriptors based on the heat kernel signature, originally introduced by Sun et al. [97], which is related to the Laplace-Beltrami operator. The authors also presented a spatially sensitive bag-of-features technique which gave good results in shape retrieval.

Obviously, the use of local features highlights a new problem: the amount of information used in the matching. With these approaches, a shape is represented with a set of descriptors and the problem of matching becomes non-trivial. In addition, the matching step is more expensive than computing a distance between points, as used in global matching. In this respect, future research directions could be motivated by this kind of matching.

With respect to 3D shape recognition, Table 7.2 shows a selection of techniques proposed to date.

As noted, most presented techniques make extensive use of local features because these can mitigate the effect of occlusion in cluttered scenes. Nevertheless, imagebased proposals have also been considered. Lee and Drew [69] extracted contours from 2D projections around a 3D object, and subsequently scale-space curvature image was obtained for each projection. These images were used to identify the class of an object and determine the object in the selected class. In addition, Cyr and Kimia [35] extracted 2D views which were grouped in view sets called aspects. These aspects were represented by a prototype view for accelerating the recognition process given views from new objects.

272

 

B. Bustos and I. Sipiran

Table 7.2 3D shape

Type

Method

recognition methods

 

 

 

Image-based

Eigen-scale-space contours [69]

 

 

Aspect-graph [35]

 

Local features

Local features histogram [50]

 

 

Spin images [56]

 

 

Spherical spin images [94]

 

 

3D shape contexts [41]

 

 

Point signatures [33]

 

 

Point fingerprint [98]

 

 

Harmonic shape images [115]

 

 

Cone Curvature [3]

 

 

Local surface patch [32]

 

 

Pyramid Matching [72]

Although it is possible to apply any approach from shape retrieval proposals to object recognition, the general technique that has received most attention is matching by local features. In their seminal work, Chua and Jarvis [33] presented the point signature, a 1D descriptor for points on a surface. To construct the descriptor around some 3D surface point, a 3D space curve is generated as the intersection of a sphere around that point. A plane is fitted to this curve which is translated along its normal until it contains the sphere center (i.e. the surface point). The distance profile of the space curve to this plane, then forms the point’s local surface descriptor. In matching, correspondences were found and a voting scheme allowed the determination of objects in a scene.

Following the idea of representing the surrounding geometry of a point, Johnson and Hebert [57] proposed their well-known and well-studied spin images. Given an object, the authors constructed 2D descriptors for points over the surface. As the name suggests, a spin image was obtained by spinning a half plane around the analyzed point’s normal and accumulating the points lying in bins of that half plane. The matching was performed by finding correspondences using the spin images between an object and a scene and subsequently a geometric verification with a modified version of the iterative closest point (ICP) algorithm [14] was performed. A variation of this technique was spherical spin images, presented by Ruiz-Correa et al. [94].

Simple information has also been employed. For instance, Hetzel et al. [50] used pixels depth, normals and curvature information in order to combine them in multi-dimensional histograms. Thus, the matching step was performed using χ 2- divergence and a posteriori Bayesian classifier. Sun et al. [98] proposed their point fingerprint, which consisted of geodesic contours projected onto a point’s tangent plane. Frome et al. [41] introduced 3D shape contexts and harmonic shape contexts. The idea behind the shape context approach is accumulating the surrounding points using concentric spheres around the analyzed point. The authors proposed to use