Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

8 3D Face Recognition

315

Chapter Outline The remainder of this chapter is structured as follows: Sect. 8.2 gives an overview of how facial surfaces are represented and visualized. More detail on these aspects can be found in Chap. 4, but the overview here keeps this chapter more self-contained. Section 8.3 gives an overview of the datasets that researchers have used to conduct comparative performance evaluations. The following section presents the types of performance evaluation that are made for both verification and recognition. Section 8.5 looks at a typical processing pipeline for a 3D face recognition system and includes face detection, spike removal, hole filling, smoothing, pose correction, resampling, feature extraction and classification. Sections 8.6 to 8.9 give a tutorial presentation of a set of well-established techniques for 3D face recognition, including ICP, holistic subspace-based methods (PCA, LDA) and curvature-based methods. Section 8.10 presents more recent state-of- the-art approaches, including annotated face models, local feature-based methods and expression-invariant 3D face recognition. Towards the end of the chapter, we present future challenges for 3D face recognition systems, conclusions, suggested further reading, and questions and exercises for the reader.

8.2 3D Face Scan Representation and Visualization

The fundamental measurement provided by most 3D cameras is a set of 3D point coordinates, pi = [xi , yi , zi ]T , described in some defined 3D camera frame. Since we are concerned with shape, encoded by relative depths of points and not absolute depths, knowledge of the exact definition of this frame is usually unimportant. If the set of points are unordered, it is termed a point cloud and there is no neighborhood or surface connectivity information explicit or implicit in the representation.

In contrast, a range image may be generated, which is a 2D arrangement of depth (Z coordinate) values of the scene corresponding to the pixels of the image plane of the 3D sensor. A range image only retains the Z coordinates of the sensed points and the corresponding (X, Y ) coordinates can be obtained from the (calibrated) camera model parameters. It is possible to convert a point cloud representation to a centrally projected range image or orthogonally projected depth map (see Fig. 8.1), which is similar to a range image but resampled with orthogonal projection.

Since range only is retained in a range image, the inverse is also possible only if the camera model (i.e. projection model) is given, so that the correct (X, Y ) coordinates can be computed for each range image pixel.

The software of many stereo-based 3D cameras augment 3D point data with surface connectivity information and thus provide the user with a polygonal mesh (e.g. in the OBJ format). This makes the object scan more suitable for the purposes of rendering and processing. Mesh representations are common in computer graphics and are generated by constructing polygons from neighboring points such that each polygon is planar and simple (i.e. has non-intersecting edges). Both of these

316

A. Mian and N. Pears

Fig. 8.1 Facial depth maps: the top row shows the captured pose when the subject has been asked to move their head 45 degrees relative to frontal. The rendering is the same as a range image i.e. brighter pixels are closer to the 3D camera. The bottom row shows resampled depth maps after a pose normalization process has been applied to the captured point cloud. Figure adapted from [72]

Fig. 8.2 A 3D face mesh (rendered as flat shaded view in MeshLab [68]) is decimated three times using the quadric error algorithm [37]

constraints are always satisfied with triangles, therefore, most meshes are made of triangles only. Delaunay triangulation is a common method used to generate triangular meshes from point clouds.

Range sensors sample a surface according to their relative orientation and the perspective view of the sensor. Therefore, surfaces that are closer and orthogonal to the sensor are sampled more densely compared to surfaces that are far or oblique to the sensor. For efficient memory utilization and rendering, points/vertices from the oversampled parts of the scene are removed using mesh decimation [42]. Mesh decimation works on the principle of collapsing edges that would minimally alter the surface geometry hence the decimated mesh contains only a fraction of points/polygons and yet retains the original 3D shape. Figure 8.2 shows a sample 3D face decimated three times using the quadric error algorithm [37].

Polygonal models are linear approximations of surfaces with flat polygons. A more accurate mathematical representation is given by non-uniform rational B- splines (NURBS). In addition to having greater accuracy, a NURBS representation is also more compact compared to the polygonal representation. NURBS represent