Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

7 3D Shape Matching for Retrieval and Recognition

277

Table 7.3 R-Precision values

Method

R-Precision

for evaluated techniques

 

 

reproduced from Bustos et

Depth Buffer

0.3220

al. [26]

 

Voxel

0.3026

 

Complex valued shading

0.2974

 

Rays with spherical harmonics

0.2815

 

Silhouette

0.2736

 

3DDFT

0.2622

 

Shading

0.2386

 

Ray-based

0.2331

 

Rotation invariant point cloud

0.2265

 

Rotation invariant spherical harmonics

0.2219

 

Shape distribution

0.1930

 

Ray moments

0.1922

 

Cords-based

0.1728

 

3D moments

0.1648

 

Volume

0.1443

 

Principal curvature

0.1119

7.3.1.4 Complexity Analysis

Given a 3D object with F triangular faces, and let n be the number of bins for the depth-buffer. In addition, let k be the number of coefficients taken after the FFT. The complexity for each stage of the method is as follows:

Construction of the depth images: O(F n2). In the worst case, each face is projected onto the entire image.

Fast Fourier Transform: O(n2 log n).

Linear search: O(P (k2 + k + 1)), where P is the number of descriptors stored in

the collection. The expression regarding k is because each distance computation is performed between descriptors of dimension 6(k2 + k + 1).

Therefore, the total complexity of this method is dominated by the complexity of the Fourier transform, i.e. O(n2 log n)

7.3.2 Spin Images for Object Recognition

In this section, we describe a 3D object recognition technique with support for occlusion and cluttered scenes. Originally, this work was proposed by Johnson and Hebert [56, 57] for recognizing objects in complex scenes obtained through a structured light range camera in order to be used in robotic systems. This has been recognized as pioneering work in the use of 3D shape matching for computer vision tasks

278

B. Bustos and I. Sipiran

and its relative success has generated increasing interest in these kinds of techniques to support high level vision tasks. In addition, the central idea behind this technique, the spin image, is one of the pioneering local 3D shape descriptors. Broadly speaking, this technique works as follows:

Given a set of 3D models (scans), we calculate a spin image for each vertex and store them in a huge collection of spin images.

Given a scene, possibly cluttered and with occlusions, random vertices are selected for which spin images are computed. Thus, we compare these spin images with those previously stored and select possible correspondences.

Finally, we need to use geometric consistency and a variation of the iterative closest point algorithm to perform correspondences validation and matching.

In order to calculate the spin images for a 3D shape, a uniform mesh is required. By uniform, we mean that distances between adjacent vertices remain close to the median of all distances. In fact, mesh resolution is defined as the median of all edge lengths from the shape. Johnson [56] proposed an efficient algorithm to control the mesh resolution which is based on mesh simplification schemes [45]. In addition, vertices have to be oriented, so each vertex must have an associated normal pointing towards the outside of the shape. We assume that a shape is uniform and each vertex is properly oriented.

To build a spin image of a vertex, we need to build a local basis defined on this vertex, so accumulating the surrounding vertices around the analyzed vertex using the local basis allows us to create a pose invariant local description. In addition, we can control how local this description is, hence the spin images can be used with large support for alignment and registration tasks and with small support for cluttered recognition.

We denote an oriented point p as a pair O = (p, n) which maintains coordinate information along with the associated normal vector n. The local basis is formed by the following elements:

The point p.

The normal n and the line L through p parallel to n.

The tangent plane P through p oriented perpendicularly to n.

We can represent any point q in this basis, as shown in Fig. 7.3, through two cylindrical coordinates: α, the perpendicular distance from q to the line L; and β, the signed perpendicular distance from q to the plane P. We define the spin-map SO

Fig. 7.3 Local basis for point p

7 3D Shape Matching for Retrieval and Recognition

279

as a function that projects 3D points q to the local 2D coordinates defined with the previous elements

SO : R3 → R2

 

 

 

 

 

 

 

 

(7.2)

SO (q) (α, β) =

 

q p 2

n · (q p)

 

2

, n · (q p)

 

 

 

 

 

 

 

 

 

The process of spin image formation uses the function previously defined accumulating points in the (α, β) image coordinate. This can be seen as spinning a matrix around a point’s normal and storing the occurrences of surrounding points in the respective coordinates in the matrix. Finally, the spin image looks like an occurrence histogram in the cylindrical coordinate system defined by the local basis.

In order to create a spin image, three useful parameters have to be defined

Bin size (bin), spatial extent for the bins in the image.

Image width (W ), number of bins in both image directions. Usually, spin images are square.

Support angle (As ), the maximum angle between normals for contributing points.

Let A = (pA, nA) be an oriented point for which we want to build its spin image. For each oriented point B = (pB , nB ) on the shape, we use the local basis and the spin-map function to obtain the coordinate (α, β). Then, the bin corresponding to that coordinate is given by

W bin β

i = 2

bin

(7.3)

α

j =

bin

Instead of directly accumulating the occurrence in the respective bin, the authors suggested the use of bilinear interpolation to accumulate the occurrence in neighboring positions. Therefore, bilinear weights are calculated as follows

a =

α

j

 

 

 

 

 

 

bin

 

(7.4)

 

W bin

β

 

 

b =

2

 

 

i

 

 

bin

 

With these weights, the image is updated as follows

I (i, j ) = I (i, j ) + (1 a)(1 b)

I (i, j + 1) = I (i, j + 1) + (1 a)b

(7.5)

I (i + 1, j ) = I (i + 1, j ) + a(1 b)

I (i + 1, j + 1) = I (i + 1, j + 1) + ab

280

B. Bustos and I. Sipiran

Fig. 7.4 Spin image generation process. (a) Input mesh, (b) mesh with normals, (c) Spin images. At top, the parameters were W = 25, As = π , and bin = {1, 2, 3, 4} times the mesh resolution. At middle, the parameters were W = {25, 50, 100, 200}, As = π , and bin = 1 times the mesh resolution. At bottom, the parameters were W = 25, bin = 4 times the mesh resolution, and As = {π, π/2, π/3, π/4} The resolution of the spin images is dependent on W

In addition, there is a constraint that the contributing points must hold with respect to the angle between normals. Only points which hold this condition are used in the spin image generation process.

a cos(nA, nB ) < As

(7.6)

where As is the support angle. When As is small, a better support to occlusion is provided, as points on the mesh with considerably different direction can be due to occlusion.

In practice, the bin size must be configured to the mesh resolution in order to preserve a good relation between sampling and descriptiveness. Also, in the original experiments carried out by Johnson [56], the spin image width was 15 and the support angle depends of how much we want to support occlusion, however a common value used in practice is 60 degrees. Figure 7.4 shows spin images generated with different values for each parameter.

Once a spin image is calculated for each vertex within a model, these will be stored for the matching process. Nevertheless, we need to compare spin images in order to determine possible correspondences. Given two spin images P and Q with