Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

274

B. Bustos and I. Sipiran

7.3.1 Depth-Buffer Descriptor

The depth-buffer descriptor [106] is an image-based descriptor. It computes 2D projections of the 3D model, and then computes its feature vector from the obtained projections. This descriptor considers not only the silhouette of each projection of the 3D model, but also considers the depth information (distance from the clipping plane, where the projection starts, to the 3D model).

The process to obtain the feature vector associated to the depth-buffer descriptor is summarized as follows.

1.Pose normalization: The depth-buffer descriptor starts with the 3D model oriented and scaled according to a predefined normalized pose.

2.Depth buffer construction: The feature extraction method renders six greyscale images using parallel projection (two projections for each principal axis). Each pixel in the 2D projection encodes, to an 8-bit grey value, the orthogonal distance from the viewing plane (i.e. sides of the bounding cube) to the object. These images correspond to the concept of z- or depth-buffers in computer graphics.

3.Fourier transformation: After rendering, the method transforms the six images using the standard 2D discrete Fourier transform.

4.Selection of coefficients: The magnitudes of certain k low-frequency coefficients of each image contribute to the depth-buffer feature vector of dimensionality 6k.

7.3.1.1 Computing the 2D Projections

The first step of the depth-buffer descriptor computes 2D projections of the 3D model. To accomplish this, the model must be first normalized in pose (by means of PCA analysis, for example), as this descriptor is not inherently invariant to rotations or scaling. Then, the model must be enclosed in a bounding cube. Each face of this cube is divided into n × n cells (with initial value 0), which will be used to compute the depth-buffers for each 2D projection. Finally, the 3D model is orthogonally projected to the face of the bounding cube. The value associated to each cell is the normalized orthogonal distance (a value in [0, 1]) between the face of the bounding cube and the closest point (orthogonally) in the 3D model.

Formally, let w be the width of the bounding cube. If a point p belongs to the surface of the 3D model, its closest orthogonal cell in the face of the bounding cube is c, and p is the closest point in the mesh to c, the associated value of c is

value(c) = w δ(c, p) , w

where δ(c, p) is the distance from c to p.

This method works well if the 3D model does not contain a significant number of outliers. Otherwise, the faces of the bounding cube may be too far to the actual

7 3D Shape Matching for Retrieval and Recognition

275

surface of the 3D model (it will only be close to the few outliers). This will result in the values of almost all cells in a face of the bounding cube being similar, except for the outliers, thus affecting the computation of the descriptor.

To avoid this problem, Vranic [106] suggests using a canonical cube that does not necessarily enclose the 3D model. The canonical cube is defined by a parameter t > 0, such that the vertices of this cube correspond to (x, y, z)|x, y, z {−t, t }. The part of the 3D model that lies outside the canonical cube is not used for computing the descriptor, thus any outlier point will be effectively ignored.

7.3.1.2 Obtaining the Feature Vector

The values associated with the cells on each face of the bounding box could be directly used as the attributes for the feature vector. This feature vector would have a dimensionality of 6n2. However, such a descriptor may lead to poor retrieval effectiveness [106]. Instead, the depth-buffer descriptor transforms the values in the spatial domain to the frequency space. Then, it selects some of the obtained coefficients to form the final descriptor.

The depth-buffer descriptor computes the 2D discrete Fourier transform for each of the depth-buffers. Briefly, the 2D discrete Fourier transform of a sequence of two-dimensional complex numbers of equal length (n in our case) is defined as

n1 n1

F (u, v) = 1 f (x, y)e2π i(xu+yv)/n

n x=0 y=0

where f (x, y), 0 x, y n 1 is the value of the cell defined by the tuple (x, y). With this definition, it is easy to recover the original values f (x, y):

n1 n1

f (x, y) = 1 F (u, v)e2π i(xu+yv)/n.

n u=0 v=0

The presented formula for F (u, v) takes O(n4) time (O(n2) operations must be applied for each cell of the n × n grid), and it must be computed for each face of the bounding cube, thus it is computationally expensive. However, if n is a power of two, the Fast Fourier Transform (FFT) can be applied to speed the computation of the coefficients, reducing the time complexity to O(n2 log n). For this purpose, Vranic [106] recommends setting n = 256.

Before computing the Fourier coefficients, the value f (0, 0) is aligned with the cell (n/2, n/2). In this way, the computed low frequency Fourier coefficients correspond to those located in the middle of the resultant image (pixels with values F (u, v)). As the inputs of the 2D discrete Fourier transform are real values, the obtained coefficients satisfy a symmetry property:

F (u, v) = F (u, v), u + u mod n = v + v mod n = 0,

where F (u, v) is the complex conjugate of F (u , v ).

276

B. Bustos and I. Sipiran

Fig. 7.2 Depth-buffer renderings. The top row shows the depth buffers of the 3D model. The bottom row shows their coefficient magnitudes of the 2D Fourier transform. Figure courtesy of [27]

After computing the Fourier coefficients, the final depth-buffer descriptor is formed as follows. First, one needs to set a parameter value k N. Then, a set of values p and q are computed, such that they hold the inequality

|p n/2| + |q n/2| ≤ k n/2.

The absolute values of the coefficients F (p, q) corresponds to the attributes of the final feature vector. It follows that the number of considered coefficients is k2 + k + 1. As we must repeat this process for each face of the bounding cube, the final dimensionality of the descriptor is 6(k2 + k + 1). Vranic [106] recommends setting k = 8, thus obtaining a feature vector of 438 dimensions.

Figure 7.2 shows the depth buffer renderings for a 3D model of a car. The first row of images shows the depth buffers of the 3D model. Darker pixels indicate that the distance between the view plane and the object is smaller than at brighter pixels. The second row shows coefficient magnitudes of the 2D Fourier transform of the six images.

7.3.1.3 Evaluation

The effectiveness of the depth-buffer descriptor was compared with several featurebased descriptors for 3D model retrieval [27]. The experimental evaluation showed that descriptors based on 2D projections of the 3D model can be more effective than other global descriptors. In particular, the depth-buffer descriptor got the highest average effectiveness among all descriptors, for queries in a heterogeneous 3D model dataset (Konstanz 3D Model Database). The dataset contained 1838 3D objects collected from the Internet. From the entire collection, 472 objects were used as queries and these contained a manual classification. Table 7.3 shows the results obtained in the experiments. In this case, the R-Precision measure was used in the evaluation.

An advantage of the depth-buffer technique is its low computational cost. In addition, regarding global retrieval, it is the technique with the best effectiveness. However, as can be noted, this method is not suitable to overcome problems such as partial matching and non-rigid retrieval. For more details about the evaluated techniques, we refer the reader to the original paper.