Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

334

A. Mian and N. Pears

As a consequence of this, many variants of ICP try to reduce the face scan match time. Coarse to fine resolution schemes can be used and we can precompute as many aspects of the algorithm as possible on the whole gallery in an offline batch process. Examples include extracting fiducial points for coarse registration, cropping to spherical volumes relative to the nose tip, building k-d trees and placing the galley scans in voxel structures for fast look up of closest points.

The ICP algorithm can accurately match rigid surfaces. However, faces are not rigid and the facial surface can significantly deform due to expressions. Consequently, the performance of standard ICP degrades under facial expressions. For example, one study cropped the 3D face region manually and then applied standard ICP for neutral and non-neutral subsets of the FRGC dataset for a rank-1 recognition test. The neutral dataset gave an average result of 91 % while the non-neutral subset was only 61.5 % [19].

However, a second advantage of ICP is that it can operate in partial surface matching schemes. Thus the problem of facial expressions can be significantly mitigated by applying ICP to the relatively rigid regions of the face [19, 64], which can be identified in the gallery scans in an offline batch process. The ability to do partial matching also allows ICP to handle pose variations by matching 2.5D scans to complete face models [59]. In the case of large pose variations, coarse prealignment using fiducial points (landmarks) may be necessary.

8.6.3 A Typical ICP-Based 3D Face Recognition Implementation

We now outline typical steps involved in a standard ICP-based 3D face recognition application. This is intended as a guide on how to implement and start using this approach, but please note that there are many variants of this algorithm available in the literature. A MATLAB implementation of ICP can be downloaded and adapted as necessary from [63].

We assume that the probe and gallery faces are near-frontal in pose. Some minor pose variations are allowable, such as is found in the FRGC v2 dataset and as might be seen in a typical cooperative verification application. Padia and Pears [69] show that, when registering a 3D face scan to an average face model, ICP converges to the correct global minimum for an initial misalignment between the scans of at least 30 degrees in any of three orthogonal rotational axes. We preprocess each gallery scan, according to the steps below.

1.Determine the closest vertex to the camera. In most reasonable quality scans, this will be close to the nose-tip. (Occasionally, the chin, lips or forehead can be closest to the camera and, in a first implementation, a quick visual check may be required so that the nose tip can be selected manually for these failure cases.)

2.Crop to a spherical region of radius 100 mm around this point. For smaller faces this may include some neck and hair area.

3.Filter spikes and interpolate over any holes.

8 3D Face Recognition

335

4.Compute the mean of the point cloud and perform a zero-mean operation (i.e. subtract the mean from each vertex).

5.Use an off-the-shelf algorithm to organize each gallery scan into a k-d tree. Many are publicly available on the web.

For each probe scan to be matched to the gallery, we follow the steps below.

1.Perform the processing steps 1–4 described for the gallery scans above. Given that both probe and gallery face scans are now zero-mean, this constitutes an initial coarse translational alignment.

2.Use a standard off-the-shelf algorithm to perform a closest-point search in the k-d tree of the gallery scan, for each point in the probe scan.

3.Delete tentative correspondences according to the filters given in the earlier 4- point list. (Use the distance and surface normal filters, at least.)

4.From the tentative correspondences, form the cross-covariance matrix using Eq. (8.7). (Note that the means used for this matrix are associated with the list of filtered tentative correspondences, not the full scans.)

5.Perform SVD on the cross-covariance matrix and hence extract the rotation matrix, R, according to Eq. (8.9).

6.Compute the translation, t using Eq. (8.10).

7.Update the alignment of the probe scan with the gallery scan using Eq. (8.11).

8.Compute e and, unless on first iteration, determine the change in e from the previous iteration. If this is below a threshold, or if the maximum number of iterations has been reached, finish. Otherwise go to step 2.

The smallest final value of e is used to determine the best match in the gallery for a rank-1 identification, although if e is not sufficiently low, it could be determined that the probe subject is not present in the gallery. Alternatively, the e value could be determined from a single gallery face scan match and thresholded in a verification test against a claimed gallery identity.

Once this basic implementation described above is operational, there are several immediate improvements that can be made. For those readers that want to improve the implementation, we suggest the following.

The final number of correspondences may vary and this number may be included, along with e, in a cost function in order to make the verification or identification decision. (i.e. A slightly higher e value could be preferable if it is accompanied by a significantly higher number of correspondences.)

Particularly for large datasets where a fast per-scan match is required, it is preferable to construct a voxel space around the gallery scans where, for the center of each voxel, the index of the nearest gallery surface point is stored. This means that for some probe point, we determine the voxel that it lies in and we just look up the corresponding gallery surface point [91].

When dealing with large pose variations of the probe, such as may be encounted in a non-cooperating subject identification scenario, more sophisticated techniques than the above are required. The cropping of the probe based on the nose-tip being the nearest point to the camera will often fail and, in profile views, the nose tip will

336

A. Mian and N. Pears

often not be detected by many current methods. Worse still, the probe will often be outside of the global minimum convergence zone of the near frontal gallery scan poses. To deal with these scenarios, techniques are needed to extract three fiducial points (landmarks) on the probe scans when they are in an arbitrary pose. Then, if a sufficiently wide range of these fiducial points is precomputed on the gallery scans, an initial coarse pose registration is possible. However, reliable landmarking of 3D face scans in arbitrary poses is not trivial and is a focus of current research. For example Creusot et al. [24] extract keypoints on 3D face scans and then label them from a set of fourteen possible labels [23].

8.6.4 ICP Variants and Other Surface Registration Approaches

Medioni and Waupotitsch [67] used a variant of ICP for 3D face recognition. Unlike other techniques, they acquired the face data using passive stereo. Maurer et al. [61] report the recognition performance of Geometrix AcitiveIDTM which uses ICP for 3D face recognition. Lu et al. [59] used shape index features along with some anchor points to perform an initial coarse registration of the faces which was later refined with ICP. They matched partial 2.5D scans to 3D face models in order to deal with large pose variations.

Chang et al. [19] proposed an adaptive rigid multiregion selection (ARMS) approach for ICP-based 3D face recognition. They automatically locate the inner eye corners, nose tip, and bridge of the nose based on mean and Gaussian curvatures. These landmarks are used to define an elliptical region around the nose of the gallery face. For a probe face, these landmarks are used to define multiple overlapping surface regions, which are individually matched to the gallery face using ICP and the results are combined. The results of Chang et al. show that using smaller regions around the nose can result in better recognition performance. A cross comparison of their ARMS approach with standard approaches on the FRGC v2 dataset gave a rank-1 performance of 97.1 % on neutral faces and 86.1 % on non-neutral faces, as compared to a PCA performance of 77.7 % (neutral) and 61.3 % (non-neutral), and a standard ICP performance of 91 %(neutral) and 61.5 % (non-neutral).

Mian et al. [64] used a variant of ICP for separately matching the eyes-forehead region and the nose. Their results show that the eyes-forehead region is more robust to facial expressions compared to the nose. Accurate automatic segmentation of the two regions was performed by first detecting the nose tip, aligning the face using PCA and then detecting the points of inflection around the nose. In the ICP variant, correspondences were established along the z-dimension. Point clouds were projected to the xy-plane before establishing correspondences and reprojected to the xyz-space for alignment. Mian et al. [64] argued that correspondences should be forced between points that are far along the viewing direction as it gives useful information about the dissimilarity between faces.

Faltemier et al. [31] aligned the face using the nose tip and selected 28 subregions on the face that remain relatively consistent in the presence of expressions