- •Preface
- •Biological Vision Systems
- •Visual Representations from Paintings to Photographs
- •Computer Vision
- •The Limitations of Standard 2D Images
- •3D Imaging, Analysis and Applications
- •Book Objective and Content
- •Acknowledgements
- •Contents
- •Contributors
- •2.1 Introduction
- •Chapter Outline
- •2.2 An Overview of Passive 3D Imaging Systems
- •2.2.1 Multiple View Approaches
- •2.2.2 Single View Approaches
- •2.3 Camera Modeling
- •2.3.1 Homogeneous Coordinates
- •2.3.2 Perspective Projection Camera Model
- •2.3.2.1 Camera Modeling: The Coordinate Transformation
- •2.3.2.2 Camera Modeling: Perspective Projection
- •2.3.2.3 Camera Modeling: Image Sampling
- •2.3.2.4 Camera Modeling: Concatenating the Projective Mappings
- •2.3.3 Radial Distortion
- •2.4 Camera Calibration
- •2.4.1 Estimation of a Scene-to-Image Planar Homography
- •2.4.2 Basic Calibration
- •2.4.3 Refined Calibration
- •2.4.4 Calibration of a Stereo Rig
- •2.5 Two-View Geometry
- •2.5.1 Epipolar Geometry
- •2.5.2 Essential and Fundamental Matrices
- •2.5.3 The Fundamental Matrix for Pure Translation
- •2.5.4 Computation of the Fundamental Matrix
- •2.5.5 Two Views Separated by a Pure Rotation
- •2.5.6 Two Views of a Planar Scene
- •2.6 Rectification
- •2.6.1 Rectification with Calibration Information
- •2.6.2 Rectification Without Calibration Information
- •2.7 Finding Correspondences
- •2.7.1 Correlation-Based Methods
- •2.7.2 Feature-Based Methods
- •2.8 3D Reconstruction
- •2.8.1 Stereo
- •2.8.1.1 Dense Stereo Matching
- •2.8.1.2 Triangulation
- •2.8.2 Structure from Motion
- •2.9 Passive Multiple-View 3D Imaging Systems
- •2.9.1 Stereo Cameras
- •2.9.2 3D Modeling
- •2.9.3 Mobile Robot Localization and Mapping
- •2.10 Passive Versus Active 3D Imaging Systems
- •2.11 Concluding Remarks
- •2.12 Further Reading
- •2.13 Questions
- •2.14 Exercises
- •References
- •3.1 Introduction
- •3.1.1 Historical Context
- •3.1.2 Basic Measurement Principles
- •3.1.3 Active Triangulation-Based Methods
- •3.1.4 Chapter Outline
- •3.2 Spot Scanners
- •3.2.1 Spot Position Detection
- •3.3 Stripe Scanners
- •3.3.1 Camera Model
- •3.3.2 Sheet-of-Light Projector Model
- •3.3.3 Triangulation for Stripe Scanners
- •3.4 Area-Based Structured Light Systems
- •3.4.1 Gray Code Methods
- •3.4.1.1 Decoding of Binary Fringe-Based Codes
- •3.4.1.2 Advantage of the Gray Code
- •3.4.2 Phase Shift Methods
- •3.4.2.1 Removing the Phase Ambiguity
- •3.4.3 Triangulation for a Structured Light System
- •3.5 System Calibration
- •3.6 Measurement Uncertainty
- •3.6.1 Uncertainty Related to the Phase Shift Algorithm
- •3.6.2 Uncertainty Related to Intrinsic Parameters
- •3.6.3 Uncertainty Related to Extrinsic Parameters
- •3.6.4 Uncertainty as a Design Tool
- •3.7 Experimental Characterization of 3D Imaging Systems
- •3.7.1 Low-Level Characterization
- •3.7.2 System-Level Characterization
- •3.7.3 Characterization of Errors Caused by Surface Properties
- •3.7.4 Application-Based Characterization
- •3.8 Selected Advanced Topics
- •3.8.1 Thin Lens Equation
- •3.8.2 Depth of Field
- •3.8.3 Scheimpflug Condition
- •3.8.4 Speckle and Uncertainty
- •3.8.5 Laser Depth of Field
- •3.8.6 Lateral Resolution
- •3.9 Research Challenges
- •3.10 Concluding Remarks
- •3.11 Further Reading
- •3.12 Questions
- •3.13 Exercises
- •References
- •4.1 Introduction
- •Chapter Outline
- •4.2 Representation of 3D Data
- •4.2.1 Raw Data
- •4.2.1.1 Point Cloud
- •4.2.1.2 Structured Point Cloud
- •4.2.1.3 Depth Maps and Range Images
- •4.2.1.4 Needle map
- •4.2.1.5 Polygon Soup
- •4.2.2 Surface Representations
- •4.2.2.1 Triangular Mesh
- •4.2.2.2 Quadrilateral Mesh
- •4.2.2.3 Subdivision Surfaces
- •4.2.2.4 Morphable Model
- •4.2.2.5 Implicit Surface
- •4.2.2.6 Parametric Surface
- •4.2.2.7 Comparison of Surface Representations
- •4.2.3 Solid-Based Representations
- •4.2.3.1 Voxels
- •4.2.3.3 Binary Space Partitioning
- •4.2.3.4 Constructive Solid Geometry
- •4.2.3.5 Boundary Representations
- •4.2.4 Summary of Solid-Based Representations
- •4.3 Polygon Meshes
- •4.3.1 Mesh Storage
- •4.3.2 Mesh Data Structures
- •4.3.2.1 Halfedge Structure
- •4.4 Subdivision Surfaces
- •4.4.1 Doo-Sabin Scheme
- •4.4.2 Catmull-Clark Scheme
- •4.4.3 Loop Scheme
- •4.5 Local Differential Properties
- •4.5.1 Surface Normals
- •4.5.2 Differential Coordinates and the Mesh Laplacian
- •4.6 Compression and Levels of Detail
- •4.6.1 Mesh Simplification
- •4.6.1.1 Edge Collapse
- •4.6.1.2 Quadric Error Metric
- •4.6.2 QEM Simplification Summary
- •4.6.3 Surface Simplification Results
- •4.7 Visualization
- •4.8 Research Challenges
- •4.9 Concluding Remarks
- •4.10 Further Reading
- •4.11 Questions
- •4.12 Exercises
- •References
- •1.1 Introduction
- •Chapter Outline
- •1.2 A Historical Perspective on 3D Imaging
- •1.2.1 Image Formation and Image Capture
- •1.2.2 Binocular Perception of Depth
- •1.2.3 Stereoscopic Displays
- •1.3 The Development of Computer Vision
- •1.3.1 Further Reading in Computer Vision
- •1.4 Acquisition Techniques for 3D Imaging
- •1.4.1 Passive 3D Imaging
- •1.4.2 Active 3D Imaging
- •1.4.3 Passive Stereo Versus Active Stereo Imaging
- •1.5 Twelve Milestones in 3D Imaging and Shape Analysis
- •1.5.1 Active 3D Imaging: An Early Optical Triangulation System
- •1.5.2 Passive 3D Imaging: An Early Stereo System
- •1.5.3 Passive 3D Imaging: The Essential Matrix
- •1.5.4 Model Fitting: The RANSAC Approach to Feature Correspondence Analysis
- •1.5.5 Active 3D Imaging: Advances in Scanning Geometries
- •1.5.6 3D Registration: Rigid Transformation Estimation from 3D Correspondences
- •1.5.7 3D Registration: Iterative Closest Points
- •1.5.9 3D Local Shape Descriptors: Spin Images
- •1.5.10 Passive 3D Imaging: Flexible Camera Calibration
- •1.5.11 3D Shape Matching: Heat Kernel Signatures
- •1.6 Applications of 3D Imaging
- •1.7 Book Outline
- •1.7.1 Part I: 3D Imaging and Shape Representation
- •1.7.2 Part II: 3D Shape Analysis and Processing
- •1.7.3 Part III: 3D Imaging Applications
- •References
- •5.1 Introduction
- •5.1.1 Applications
- •5.1.2 Chapter Outline
- •5.2 Mathematical Background
- •5.2.1 Differential Geometry
- •5.2.2 Curvature of Two-Dimensional Surfaces
- •5.2.3 Discrete Differential Geometry
- •5.2.4 Diffusion Geometry
- •5.2.5 Discrete Diffusion Geometry
- •5.3 Feature Detectors
- •5.3.1 A Taxonomy
- •5.3.2 Harris 3D
- •5.3.3 Mesh DOG
- •5.3.4 Salient Features
- •5.3.5 Heat Kernel Features
- •5.3.6 Topological Features
- •5.3.7 Maximally Stable Components
- •5.3.8 Benchmarks
- •5.4 Feature Descriptors
- •5.4.1 A Taxonomy
- •5.4.2 Curvature-Based Descriptors (HK and SC)
- •5.4.3 Spin Images
- •5.4.4 Shape Context
- •5.4.5 Integral Volume Descriptor
- •5.4.6 Mesh Histogram of Gradients (HOG)
- •5.4.7 Heat Kernel Signature (HKS)
- •5.4.8 Scale-Invariant Heat Kernel Signature (SI-HKS)
- •5.4.9 Color Heat Kernel Signature (CHKS)
- •5.4.10 Volumetric Heat Kernel Signature (VHKS)
- •5.5 Research Challenges
- •5.6 Conclusions
- •5.7 Further Reading
- •5.8 Questions
- •5.9 Exercises
- •References
- •6.1 Introduction
- •Chapter Outline
- •6.2 Registration of Two Views
- •6.2.1 Problem Statement
- •6.2.2 The Iterative Closest Points (ICP) Algorithm
- •6.2.3 ICP Extensions
- •6.2.3.1 Techniques for Pre-alignment
- •Global Approaches
- •Local Approaches
- •6.2.3.2 Techniques for Improving Speed
- •Subsampling
- •Closest Point Computation
- •Distance Formulation
- •6.2.3.3 Techniques for Improving Accuracy
- •Outlier Rejection
- •Additional Information
- •Probabilistic Methods
- •6.3 Advanced Techniques
- •6.3.1 Registration of More than Two Views
- •Reducing Error Accumulation
- •Automating Registration
- •6.3.2 Registration in Cluttered Scenes
- •Point Signatures
- •Matching Methods
- •6.3.3 Deformable Registration
- •Methods Based on General Optimization Techniques
- •Probabilistic Methods
- •6.3.4 Machine Learning Techniques
- •Improving the Matching
- •Object Detection
- •6.4 Quantitative Performance Evaluation
- •6.5 Case Study 1: Pairwise Alignment with Outlier Rejection
- •6.6 Case Study 2: ICP with Levenberg-Marquardt
- •6.6.1 The LM-ICP Method
- •6.6.2 Computing the Derivatives
- •6.6.3 The Case of Quaternions
- •6.6.4 Summary of the LM-ICP Algorithm
- •6.6.5 Results and Discussion
- •6.7 Case Study 3: Deformable ICP with Levenberg-Marquardt
- •6.7.1 Surface Representation
- •6.7.2 Cost Function
- •Data Term: Global Surface Attraction
- •Data Term: Boundary Attraction
- •Penalty Term: Spatial Smoothness
- •Penalty Term: Temporal Smoothness
- •6.7.3 Minimization Procedure
- •6.7.4 Summary of the Algorithm
- •6.7.5 Experiments
- •6.8 Research Challenges
- •6.9 Concluding Remarks
- •6.10 Further Reading
- •6.11 Questions
- •6.12 Exercises
- •References
- •7.1 Introduction
- •7.1.1 Retrieval and Recognition Evaluation
- •7.1.2 Chapter Outline
- •7.2 Literature Review
- •7.3 3D Shape Retrieval Techniques
- •7.3.1 Depth-Buffer Descriptor
- •7.3.1.1 Computing the 2D Projections
- •7.3.1.2 Obtaining the Feature Vector
- •7.3.1.3 Evaluation
- •7.3.1.4 Complexity Analysis
- •7.3.2 Spin Images for Object Recognition
- •7.3.2.1 Matching
- •7.3.2.2 Evaluation
- •7.3.2.3 Complexity Analysis
- •7.3.3 Salient Spectral Geometric Features
- •7.3.3.1 Feature Points Detection
- •7.3.3.2 Local Descriptors
- •7.3.3.3 Shape Matching
- •7.3.3.4 Evaluation
- •7.3.3.5 Complexity Analysis
- •7.3.4 Heat Kernel Signatures
- •7.3.4.1 Evaluation
- •7.3.4.2 Complexity Analysis
- •7.4 Research Challenges
- •7.5 Concluding Remarks
- •7.6 Further Reading
- •7.7 Questions
- •7.8 Exercises
- •References
- •8.1 Introduction
- •Chapter Outline
- •8.2 3D Face Scan Representation and Visualization
- •8.3 3D Face Datasets
- •8.3.1 FRGC v2 3D Face Dataset
- •8.3.2 The Bosphorus Dataset
- •8.4 3D Face Recognition Evaluation
- •8.4.1 Face Verification
- •8.4.2 Face Identification
- •8.5 Processing Stages in 3D Face Recognition
- •8.5.1 Face Detection and Segmentation
- •8.5.2 Removal of Spikes
- •8.5.3 Filling of Holes and Missing Data
- •8.5.4 Removal of Noise
- •8.5.5 Fiducial Point Localization and Pose Correction
- •8.5.6 Spatial Resampling
- •8.5.7 Feature Extraction on Facial Surfaces
- •8.5.8 Classifiers for 3D Face Matching
- •8.6 ICP-Based 3D Face Recognition
- •8.6.1 ICP Outline
- •8.6.2 A Critical Discussion of ICP
- •8.6.3 A Typical ICP-Based 3D Face Recognition Implementation
- •8.6.4 ICP Variants and Other Surface Registration Approaches
- •8.7 PCA-Based 3D Face Recognition
- •8.7.1 PCA System Training
- •8.7.2 PCA Training Using Singular Value Decomposition
- •8.7.3 PCA Testing
- •8.7.4 PCA Performance
- •8.8 LDA-Based 3D Face Recognition
- •8.8.1 Two-Class LDA
- •8.8.2 LDA with More than Two Classes
- •8.8.3 LDA in High Dimensional 3D Face Spaces
- •8.8.4 LDA Performance
- •8.9 Normals and Curvature in 3D Face Recognition
- •8.9.1 Computing Curvature on a 3D Face Scan
- •8.10 Recent Techniques in 3D Face Recognition
- •8.10.1 3D Face Recognition Using Annotated Face Models (AFM)
- •8.10.2 Local Feature-Based 3D Face Recognition
- •8.10.2.1 Keypoint Detection and Local Feature Matching
- •8.10.2.2 Other Local Feature-Based Methods
- •8.10.3 Expression Modeling for Invariant 3D Face Recognition
- •8.10.3.1 Other Expression Modeling Approaches
- •8.11 Research Challenges
- •8.12 Concluding Remarks
- •8.13 Further Reading
- •8.14 Questions
- •8.15 Exercises
- •References
- •9.1 Introduction
- •Chapter Outline
- •9.2 DEM Generation from Stereoscopic Imagery
- •9.2.1 Stereoscopic DEM Generation: Literature Review
- •9.2.2 Accuracy Evaluation of DEMs
- •9.2.3 An Example of DEM Generation from SPOT-5 Imagery
- •9.3 DEM Generation from InSAR
- •9.3.1 Techniques for DEM Generation from InSAR
- •9.3.1.1 Basic Principle of InSAR in Elevation Measurement
- •9.3.1.2 Processing Stages of DEM Generation from InSAR
- •The Branch-Cut Method of Phase Unwrapping
- •The Least Squares (LS) Method of Phase Unwrapping
- •9.3.2 Accuracy Analysis of DEMs Generated from InSAR
- •9.3.3 Examples of DEM Generation from InSAR
- •9.4 DEM Generation from LIDAR
- •9.4.1 LIDAR Data Acquisition
- •9.4.2 Accuracy, Error Types and Countermeasures
- •9.4.3 LIDAR Interpolation
- •9.4.4 LIDAR Filtering
- •9.4.5 DTM from Statistical Properties of the Point Cloud
- •9.5 Research Challenges
- •9.6 Concluding Remarks
- •9.7 Further Reading
- •9.8 Questions
- •9.9 Exercises
- •References
- •10.1 Introduction
- •10.1.1 Allometric Modeling of Biomass
- •10.1.2 Chapter Outline
- •10.2 Aerial Photo Mensuration
- •10.2.1 Principles of Aerial Photogrammetry
- •10.2.1.1 Geometric Basis of Photogrammetric Measurement
- •10.2.1.2 Ground Control and Direct Georeferencing
- •10.2.2 Tree Height Measurement Using Forest Photogrammetry
- •10.2.2.2 Automated Methods in Forest Photogrammetry
- •10.3 Airborne Laser Scanning
- •10.3.1 Principles of Airborne Laser Scanning
- •10.3.1.1 Lidar-Based Measurement of Terrain and Canopy Surfaces
- •10.3.2 Individual Tree-Level Measurement Using Lidar
- •10.3.2.1 Automated Individual Tree Measurement Using Lidar
- •10.3.3 Area-Based Approach to Estimating Biomass with Lidar
- •10.4 Future Developments
- •10.5 Concluding Remarks
- •10.6 Further Reading
- •10.7 Questions
- •References
- •11.1 Introduction
- •Chapter Outline
- •11.2 Volumetric Data Acquisition
- •11.2.1 Computed Tomography
- •11.2.1.1 Characteristics of 3D CT Data
- •11.2.2 Positron Emission Tomography (PET)
- •11.2.2.1 Characteristics of 3D PET Data
- •Relaxation
- •11.2.3.1 Characteristics of the 3D MRI Data
- •Image Quality and Artifacts
- •11.2.4 Summary
- •11.3 Surface Extraction and Volumetric Visualization
- •11.3.1 Surface Extraction
- •Example: Curvatures and Geometric Tools
- •11.3.2 Volume Rendering
- •11.3.3 Summary
- •11.4 Volumetric Image Registration
- •11.4.1 A Hierarchy of Transformations
- •11.4.1.1 Rigid Body Transformation
- •11.4.1.2 Similarity Transformations and Anisotropic Scaling
- •11.4.1.3 Affine Transformations
- •11.4.1.4 Perspective Transformations
- •11.4.1.5 Non-rigid Transformations
- •11.4.2 Points and Features Used for the Registration
- •11.4.2.1 Landmark Features
- •11.4.2.2 Surface-Based Registration
- •11.4.2.3 Intensity-Based Registration
- •11.4.3 Registration Optimization
- •11.4.3.1 Estimation of Registration Errors
- •11.4.4 Summary
- •11.5 Segmentation
- •11.5.1 Semi-automatic Methods
- •11.5.1.1 Thresholding
- •11.5.1.2 Region Growing
- •11.5.1.3 Deformable Models
- •Snakes
- •Balloons
- •11.5.2 Fully Automatic Methods
- •11.5.2.1 Atlas-Based Segmentation
- •11.5.2.2 Statistical Shape Modeling and Analysis
- •11.5.3 Summary
- •11.6 Diffusion Imaging: An Illustration of a Full Pipeline
- •11.6.1 From Scalar Images to Tensors
- •11.6.2 From Tensor Image to Information
- •11.6.3 Summary
- •11.7 Applications
- •11.7.1 Diagnosis and Morphometry
- •11.7.2 Simulation and Training
- •11.7.3 Surgical Planning and Guidance
- •11.7.4 Summary
- •11.8 Concluding Remarks
- •11.9 Research Challenges
- •11.10 Further Reading
- •Data Acquisition
- •Surface Extraction
- •Volume Registration
- •Segmentation
- •Diffusion Imaging
- •Software
- •11.11 Questions
- •11.12 Exercises
- •References
- •Index
8 3D Face Recognition |
317 |
surfaces continuously in two parametric directions with the help of control points and knots. Details of NURBS and B-splines can be found in [74].
A large number of visualization tools are available for rendering and manipulating 3D data. Most are built using the Open Graphics Library (OpenGL) and can render point clouds, meshes, filled polygons with constant shading or texture mappings. Likewise, file formats are numerous, however, they all follow the same convention i.e. a header followed by the X, Y, Z coordinates of the point cloud and the list of polygons. The header usually contains information about the sensor, rendering parameters, the number of points and polygons and so on. The polygon list contains index numbers of points that form a polygon. A few examples of free visualization and manipulation tools are PCL, MeshLab [68], Plyview, Blender and Scanalyze. Proprietary softwares include Maya, Rhino and Poser. Online datasets of 3D faces are also on the increase and the most well-known of these are discussed in the following section.
8.3 3D Face Datasets
A range of 3D face datasets have been employed by the research community for 3D face recognition and verification evaluations. Examples with a large number (>1000) of images include
•The Face Recognition Grand Challenge v2 3D dataset (FRGC v2) [73].
•The Bosphorus 3D face dataset (Bosphorus) [13].
•The Texas 3D face recognition dataset (Texas3DFRD) [86].
•The Notre Dame 3D face dataset (ND-2006) [30].
•The Binghamton University 3D facial expression dataset(BU-3DFE) [93].
•The Chinese Academy of Sciences Institute of Automation 3D dataset (CASIA) [16].
•The University of York 3D face dataset (York3DFD) [45].
Datasets are characterized by the number of 3D face images captured, the number of subjects imaged, the quality and resolution of the images and the variations in facial pose, expression, and occlusion. Table 8.1 outlines this characterization for the datasets mentioned above.
Datasets comprising several thousand images, gathered from several hundred subjects, allow researchers to observe relatively small differences in performance in a statistically significant way. Different datasets are more or less suitable for evaluating different types of face recognition scenario. For example, a frontal view, neutral expression dataset is suitable for ‘verification-of-cooperative-subject’ evaluations but a dataset with pose variations, expression variations and occlusion by head accessories (such as hat and spectacles) is necessary for evaluating a typical ‘identification-at-a-distance’ scenario for non-cooperating subjects who may even be unaware that they are being imaged. This type of dataset is more challenging to collect as all common combinations of variations need to be covered, leading to a much larger number of captures per subject.
318
Table 8.1 A selection of 3D face datasets used for face recognition evaluation
|
|
|
|
A. Mian and N. Pears |
||
Dataset |
Subj. |
Samp. |
Size |
Exp. |
Pose |
Occ. |
FRGC v2 |
466 |
1–22 |
4950 |
Yes |
No |
No |
Bosphorus |
105 |
29–54 |
4666 |
Yes |
Yes |
Yes |
Texas3DFRD |
118 |
1–89 |
1149 |
Yes |
No |
No |
ND-2006 |
888 |
1–63 |
13450 |
Yes |
Yes |
No |
BU-3DFE |
100 |
25 |
2500 |
Yes |
No |
No |
CASIA |
123 |
15 |
1845 |
Yes |
No |
No |
York3DFD |
350 |
15 |
5250 |
Yes |
Yes |
No |
Due to space limitations, we only describe the most influential of these, the FRGC v2 3D face dataset and the Bosphorus dataset. The reader is referred to the literature references for the details of other datasets.
8.3.1 FRGC v2 3D Face Dataset
Over recent years, the Face Recognition Grand Challenge version 2 (FRGC v2) 3D face dataset [73] has been the most widely used dataset in benchmark face identification and verification evaluations. Many researchers use this dataset because it allows them to compare the performance of their algorithm with many others. Indeed, it has become almost expected within the research community that face recognition evaluations will include the use of this dataset, at least for the cooperating subject verification scenario.
The FRGC v2 dataset contains a 3D dataset of 4950 3D face images, where each capture contains two channels of information: a structured 3D point cloud2 and registered color-texture image (i.e. a standard 2D image). An example of both of these channels of information, each of which has a 640 × 480 resolution, is shown in Fig. 8.3. Thus, in addition to 3D face recognition, 2D/3D multimodal recognition is supported. Results comparing 2D, 3D and multi-modal PCA-based recognition on this dataset are given in [18]. There is a direct correspondence between the 3D points in the 640 × 480 range data (abs format) and pixels in the 640 × 480 colortexture image (ppm format). The range data file is an ASCII text file, containing some header information, a 640 × 480 array of binary flags indicating where a valid range reading has been made, and the (X, Y, Z) tuples over a 640× 480 array. Where a range flag reads invalid, each value in the tuple is set to −999999.0.
The range sensor employed collects the texture data just after the range data. This opens up the possibility of the subject moving their face slightly between these two captures and can lead to poor registration between the range and texture channels.
23D points are structured in a rectangular array and since (x, y) values are included, strictly it is not a range image, which contains z-values only.
8 3D Face Recognition |
319 |
Fig. 8.3 Top row: 2D images reprinted from the FRGC v2 dataset [73]. Second row: a rendering of the corresponding 3D data, after conversion from ABS to OBJ format
Often the texture channel is used to mark up fiducial points on the face, such as eye corners, nose-tip and mouth corners, and these are then mapped onto the range data using the known, implicit registration to give 3D landmarks. In this case, and in other cases where channel registration is important, these 3D scans must not be used.
A key point in the proper evaluation of any supervised training system is that the differences between the training and testing 3D face scans must be comparable to what is expected when test data is gathered in a live operational recognition scenario. In dataset development, capturing consecutive scans of the same person leads to train/test data that is too similar for proper evaluation, hence the different collection periods of the FRGC v2 dataset, which is divided into three sections, Spring-2003, Fall-2003 and Spring-2004, named after their periods of collection. The Spring2003 set is regarded by Phillips et al. [73] as a training set. It consists of 943 3D face images, with subjects in a cooperating, near frontal pose and a neutral facial expression. The majority of the images in this subset are at a depth such that the neck and shoulder areas are also imaged. In contrast, a small number of the images contain close-up shots, where only the facial area is imaged. Thus the resolution of imaged facial features is higher in these scans. Also, within this training subset, ambient lighting is controlled.
In the Fall-2003 and Spring-2004 datasets, ambient lighting is uncontrolled, and facial expressions are varied. Expressions include any of anger, happiness, sadness, surprise, disgust and ‘puffy cheeks’, although many subjects are not scanned with the full set of these expressions. Additionally, there is less depth variation than in the Spring-2003 subset and all of the images are close ups with little shoulder area captured. Again pose can be regarded as frontal and cooperative, but there are some mild pose variations. In these data subsets there are 4007 two-channel scans of 466 subjects (1–22 scans per subject) and Phillips et al. [73] regard these as validation datasets.
320 |
A. Mian and N. Pears |
The development of this dataset has made a huge contribution to the advancement of 3D face recognition algorithms. However, an interesting point to note is that pose invariance has been an often quoted advantage of 3D face recognition over 2D face recognition, and yet many recognition evaluations have been made on the FRGC dataset, which only contains the very mild pose variations associated with cooperative subjects. In non-cooperative subject face recognition applications, other datasets are needed that contain scans from a wide range of different head poses including, for example, full profile views, thus challenging the recognition system in terms of its ability to represent and match partial surfaces. One example of such a dataset is the Bosphorus dataset, which we now briefly outline.
8.3.2 The Bosphorus Dataset
The Bosphorus dataset contains 4666 3D face images of 105 subjects, with 29–54 3D captures per subject [13, 79]. Thus, compared to FRGC v2, there are fewer subjects, but there are more images per subject so that a wider variety of natural, unconstrained viewing conditions, associated with non-cooperative image capture, can be explored. There are three types of variation: expression, pose and occlusion. The standard set of 6 expressions is included, which includes happiness, surprise, fear, anger, sadness and disgust, but these are augmented with a set of more primitive expressions based on facial action units (FACS) to give 34 expressions in total. There are 14 different poses (with neutral expression) comprising 8 yaw angles, 4 pitch angles, and two combined rotations of approximately 45 degrees yaw and
±20 degrees pitch. Finally there are up to 4 scans with occlusions, including hand over eye, hand over mouth, spectacles and hair over face.
The data is captured with an Inspeck Mega Capturor II 3D camera. Subjects sit at around 1.5 m from the camera and (X, Y, Z) resolution is 0.3 mm, 0.3 mm, 0.4 mm for each of these dimensions respectively. The associated color-texture images are 1600 × 1200 pixels. Example captures from the Bosphorus dataset, which are stored as structured depth maps, are shown in Fig. 8.4. Registration with an associated 2D image is stored as two arrays of texture coordinates. More details of capture conditions can be found in [79].
Fig. 8.4 Example 3D scans from the Bosphorus dataset with occlusions, expression variations and pose variations [13, 79]. Figure adapted from [25]