Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

20

R. Koch et al.

1.5.9 3D Local Shape Descriptors: Spin Images

The ICP algorithm fails if it converges to a local minimum that is not the global minimum of the least-squares registration error. A common approach to prevent this is to determine a sparse set of three or more strong local descriptor (feature) matches across the pair of 3D shapes, which allows coarse 3D registration to within the convergence basin of ICP. Probably the most well-known 3D local shape descriptor is the spin image [26], presented by Johnson and Hebert in 1997. Here the local normal of a 3D point is used to encode neighbouring points by measuring their height in the direction of the normal and their radius in the tangential plane described by the normal. Thus a spin image encodes the relative positions of neighboring points in a cylindrical-polar coordinate system. The neighbor’s angles in the tangential plane are discarded in order to give pose-invariance to the descriptor and the heights and radius values of the neighbors are built into a two-dimensional histogram, which forms the spin image descriptor. A large number of experiments in the literature have shown that the spin images are powerful for several tasks that include the registration of overlapping shapes, 3D object recognition and 3D shape retrieval (shape search).

Figure 1.8 shows some examples of spin images computed on 3D captures of human faces [11]. In this case, the spin images are taken over a limited local range, as the 3D face surfaces are partial scans taken from a single viewpoint (there is no scanned surface for the back of the head). In the figure, the spin images for a given landmark appear quite similar across two different faces. For complete 3D scans, it is possible for spin images to encode the full extent of the object.

1.5.10 Passive 3D Imaging: Flexible Camera Calibration

Camera calibration is the process whereby intrinsic camera parameters are established, such as the focal length of the lens and the size and aspect ratio of the image sensor pixels. The position and orientation of the camera relative to the scene is also established and the parameters describing this are referred to as extrinsic parameters. Many current approaches to camera calibration are based on the easy-to-use, yet accurate approach presented by Zhang in 2000,22 where calibration can be achieved from n-views of a calibration grid of known grid dimensions [57]. This calibration grid is a planar ‘chessboard’ pattern of alternate black and white squares which can be freely moved as the calibration images are captured, the motion between the captures is not required, hence the system is easy

22Zhang’s seminal work is pre-dated by a large body of pioneering work on calibration, such as D.C. Brown’s work in the context of photogrammetry, which dates back to the 1950s and many other works in computer vision, such as the seminal two-stage method of Tsai [53].

1 Introduction

21

Fig. 1.8 Example spin images computed for 14 landmarks on two different faces from the FRGC dataset. Here a bin size of 5 mm is used. The size of the spin image is 18 × 9 pixels. The middle top part of the spin image is the 3D surface point whose local shape we are encoding; the left part of the spin image corresponds to points above this 3D point in the direction of the normal; the right part corresponds to points below, using this same direction. The vertical direction in the spin image corresponds to the radius in the tangential plane. Figure adapted from [11], courtesy of Clement Creusot

to use. Although the minimum number of images captured is 2, around 20 are commonly used for improved accuracy. The estimation is in two-stages: firstly, a closed-form linear solution for the camera’s parameters is used, followed by a non-linear refinement based on the maximum-likelihood criterion. In the first stage, lens distortion is assumed to be zero, whereas the second stage provides a mechanism for radial distortion parameters to be estimated, if required. Figure 1.9 illustrates a typical set of calibration plane positions to calibrate a camera and a standard corner detector is used to find each junction of four squares on each chess-

22

R. Koch et al.

Fig. 1.9 Left: calibration targets used in a camera calibration process, images courtesy of Hao Sun. Right: after calibration, it is possible to determine the positions of the calibration planes using the estimated extrinsic parameters. Figure generated by the Camera Calibration Toolbox for Matlab, webpage at http://www.vision.caltech.edu/bouguetj/calib_doc/, accessed 22nd Dec 2011. Page maintained by Jean-Yves Bouguet

board image. The same corner position input data can be used to calibrate a stereo rig, whereby two sets of intrinsic parameters are established and the extrinsic parameters define the 6 degree-of-freedom rigid pose of one camera relative to another.

1.5.11 3D Shape Matching: Heat Kernel Signatures

One problem with spin images and other local shape descriptors is that they are encoded in Euclidean space and are only valid for rigid local shapes. Understanding shapes in terms of their geodesic distances23 can yield more generic approaches that are not degraded by the bending of the shape (i.e. they are isometric invariant). In 2009, Sun et al. [48] presented a multi-scale shape signature that is isometric invariant. It is based on the properties of the heat diffusion process over a meshed surface and belongs to a class of methods known as diffusion geometry approaches. The concise signature is obtained by restricting the heat kernel to the temporal domain. This technique and other approaches involving diffusion geometry enable high performance 3D shape matching under isometric deformation and are thus finding their way into 3D object recognition and 3D shape retrieval (search) applications. Figure 1.10 shows heat kernel signatures extracted on two instances of an isometrically deformed shape.

23A geodesic distance between two points on a surface is the minimal across-surface distance.

1 Introduction

23

Fig. 1.10 Heat kernel signatures calculated on two isometric shapes. The top row shows that signatures at corresponding points look very similar. The bottom row shows that signatures at different points on the mesh differ. Figure reproduced from Chap. 7, courtesy of Benjamin Bustos

1.5.12A Seminal Application: Real-Time Human Pose Recognition

In 2011, Shotton et al. [46] presented a system that was able to segment the whole body into a set of parts based on a single range image (i.e. no temporal information is used) and thereby determine, in real-time, the position of human body joints. This process is illustrated in Fig. 1.11. A design aim was high frame rates and they achieved 200 frames per second on consumer hardware. The system can run comfortably using an inexpensive 3D camera and the algorithm forms a core component

Fig. 1.11 From a single depth image, range pixels can be labeled as belonging to certain body parts, which are colored differently in the figure. From this labeling, 3D joint positions can be inferred. Figure courtesy of [46]