- •Preface
- •Biological Vision Systems
- •Visual Representations from Paintings to Photographs
- •Computer Vision
- •The Limitations of Standard 2D Images
- •3D Imaging, Analysis and Applications
- •Book Objective and Content
- •Acknowledgements
- •Contents
- •Contributors
- •2.1 Introduction
- •Chapter Outline
- •2.2 An Overview of Passive 3D Imaging Systems
- •2.2.1 Multiple View Approaches
- •2.2.2 Single View Approaches
- •2.3 Camera Modeling
- •2.3.1 Homogeneous Coordinates
- •2.3.2 Perspective Projection Camera Model
- •2.3.2.1 Camera Modeling: The Coordinate Transformation
- •2.3.2.2 Camera Modeling: Perspective Projection
- •2.3.2.3 Camera Modeling: Image Sampling
- •2.3.2.4 Camera Modeling: Concatenating the Projective Mappings
- •2.3.3 Radial Distortion
- •2.4 Camera Calibration
- •2.4.1 Estimation of a Scene-to-Image Planar Homography
- •2.4.2 Basic Calibration
- •2.4.3 Refined Calibration
- •2.4.4 Calibration of a Stereo Rig
- •2.5 Two-View Geometry
- •2.5.1 Epipolar Geometry
- •2.5.2 Essential and Fundamental Matrices
- •2.5.3 The Fundamental Matrix for Pure Translation
- •2.5.4 Computation of the Fundamental Matrix
- •2.5.5 Two Views Separated by a Pure Rotation
- •2.5.6 Two Views of a Planar Scene
- •2.6 Rectification
- •2.6.1 Rectification with Calibration Information
- •2.6.2 Rectification Without Calibration Information
- •2.7 Finding Correspondences
- •2.7.1 Correlation-Based Methods
- •2.7.2 Feature-Based Methods
- •2.8 3D Reconstruction
- •2.8.1 Stereo
- •2.8.1.1 Dense Stereo Matching
- •2.8.1.2 Triangulation
- •2.8.2 Structure from Motion
- •2.9 Passive Multiple-View 3D Imaging Systems
- •2.9.1 Stereo Cameras
- •2.9.2 3D Modeling
- •2.9.3 Mobile Robot Localization and Mapping
- •2.10 Passive Versus Active 3D Imaging Systems
- •2.11 Concluding Remarks
- •2.12 Further Reading
- •2.13 Questions
- •2.14 Exercises
- •References
- •3.1 Introduction
- •3.1.1 Historical Context
- •3.1.2 Basic Measurement Principles
- •3.1.3 Active Triangulation-Based Methods
- •3.1.4 Chapter Outline
- •3.2 Spot Scanners
- •3.2.1 Spot Position Detection
- •3.3 Stripe Scanners
- •3.3.1 Camera Model
- •3.3.2 Sheet-of-Light Projector Model
- •3.3.3 Triangulation for Stripe Scanners
- •3.4 Area-Based Structured Light Systems
- •3.4.1 Gray Code Methods
- •3.4.1.1 Decoding of Binary Fringe-Based Codes
- •3.4.1.2 Advantage of the Gray Code
- •3.4.2 Phase Shift Methods
- •3.4.2.1 Removing the Phase Ambiguity
- •3.4.3 Triangulation for a Structured Light System
- •3.5 System Calibration
- •3.6 Measurement Uncertainty
- •3.6.1 Uncertainty Related to the Phase Shift Algorithm
- •3.6.2 Uncertainty Related to Intrinsic Parameters
- •3.6.3 Uncertainty Related to Extrinsic Parameters
- •3.6.4 Uncertainty as a Design Tool
- •3.7 Experimental Characterization of 3D Imaging Systems
- •3.7.1 Low-Level Characterization
- •3.7.2 System-Level Characterization
- •3.7.3 Characterization of Errors Caused by Surface Properties
- •3.7.4 Application-Based Characterization
- •3.8 Selected Advanced Topics
- •3.8.1 Thin Lens Equation
- •3.8.2 Depth of Field
- •3.8.3 Scheimpflug Condition
- •3.8.4 Speckle and Uncertainty
- •3.8.5 Laser Depth of Field
- •3.8.6 Lateral Resolution
- •3.9 Research Challenges
- •3.10 Concluding Remarks
- •3.11 Further Reading
- •3.12 Questions
- •3.13 Exercises
- •References
- •4.1 Introduction
- •Chapter Outline
- •4.2 Representation of 3D Data
- •4.2.1 Raw Data
- •4.2.1.1 Point Cloud
- •4.2.1.2 Structured Point Cloud
- •4.2.1.3 Depth Maps and Range Images
- •4.2.1.4 Needle map
- •4.2.1.5 Polygon Soup
- •4.2.2 Surface Representations
- •4.2.2.1 Triangular Mesh
- •4.2.2.2 Quadrilateral Mesh
- •4.2.2.3 Subdivision Surfaces
- •4.2.2.4 Morphable Model
- •4.2.2.5 Implicit Surface
- •4.2.2.6 Parametric Surface
- •4.2.2.7 Comparison of Surface Representations
- •4.2.3 Solid-Based Representations
- •4.2.3.1 Voxels
- •4.2.3.3 Binary Space Partitioning
- •4.2.3.4 Constructive Solid Geometry
- •4.2.3.5 Boundary Representations
- •4.2.4 Summary of Solid-Based Representations
- •4.3 Polygon Meshes
- •4.3.1 Mesh Storage
- •4.3.2 Mesh Data Structures
- •4.3.2.1 Halfedge Structure
- •4.4 Subdivision Surfaces
- •4.4.1 Doo-Sabin Scheme
- •4.4.2 Catmull-Clark Scheme
- •4.4.3 Loop Scheme
- •4.5 Local Differential Properties
- •4.5.1 Surface Normals
- •4.5.2 Differential Coordinates and the Mesh Laplacian
- •4.6 Compression and Levels of Detail
- •4.6.1 Mesh Simplification
- •4.6.1.1 Edge Collapse
- •4.6.1.2 Quadric Error Metric
- •4.6.2 QEM Simplification Summary
- •4.6.3 Surface Simplification Results
- •4.7 Visualization
- •4.8 Research Challenges
- •4.9 Concluding Remarks
- •4.10 Further Reading
- •4.11 Questions
- •4.12 Exercises
- •References
- •1.1 Introduction
- •Chapter Outline
- •1.2 A Historical Perspective on 3D Imaging
- •1.2.1 Image Formation and Image Capture
- •1.2.2 Binocular Perception of Depth
- •1.2.3 Stereoscopic Displays
- •1.3 The Development of Computer Vision
- •1.3.1 Further Reading in Computer Vision
- •1.4 Acquisition Techniques for 3D Imaging
- •1.4.1 Passive 3D Imaging
- •1.4.2 Active 3D Imaging
- •1.4.3 Passive Stereo Versus Active Stereo Imaging
- •1.5 Twelve Milestones in 3D Imaging and Shape Analysis
- •1.5.1 Active 3D Imaging: An Early Optical Triangulation System
- •1.5.2 Passive 3D Imaging: An Early Stereo System
- •1.5.3 Passive 3D Imaging: The Essential Matrix
- •1.5.4 Model Fitting: The RANSAC Approach to Feature Correspondence Analysis
- •1.5.5 Active 3D Imaging: Advances in Scanning Geometries
- •1.5.6 3D Registration: Rigid Transformation Estimation from 3D Correspondences
- •1.5.7 3D Registration: Iterative Closest Points
- •1.5.9 3D Local Shape Descriptors: Spin Images
- •1.5.10 Passive 3D Imaging: Flexible Camera Calibration
- •1.5.11 3D Shape Matching: Heat Kernel Signatures
- •1.6 Applications of 3D Imaging
- •1.7 Book Outline
- •1.7.1 Part I: 3D Imaging and Shape Representation
- •1.7.2 Part II: 3D Shape Analysis and Processing
- •1.7.3 Part III: 3D Imaging Applications
- •References
- •5.1 Introduction
- •5.1.1 Applications
- •5.1.2 Chapter Outline
- •5.2 Mathematical Background
- •5.2.1 Differential Geometry
- •5.2.2 Curvature of Two-Dimensional Surfaces
- •5.2.3 Discrete Differential Geometry
- •5.2.4 Diffusion Geometry
- •5.2.5 Discrete Diffusion Geometry
- •5.3 Feature Detectors
- •5.3.1 A Taxonomy
- •5.3.2 Harris 3D
- •5.3.3 Mesh DOG
- •5.3.4 Salient Features
- •5.3.5 Heat Kernel Features
- •5.3.6 Topological Features
- •5.3.7 Maximally Stable Components
- •5.3.8 Benchmarks
- •5.4 Feature Descriptors
- •5.4.1 A Taxonomy
- •5.4.2 Curvature-Based Descriptors (HK and SC)
- •5.4.3 Spin Images
- •5.4.4 Shape Context
- •5.4.5 Integral Volume Descriptor
- •5.4.6 Mesh Histogram of Gradients (HOG)
- •5.4.7 Heat Kernel Signature (HKS)
- •5.4.8 Scale-Invariant Heat Kernel Signature (SI-HKS)
- •5.4.9 Color Heat Kernel Signature (CHKS)
- •5.4.10 Volumetric Heat Kernel Signature (VHKS)
- •5.5 Research Challenges
- •5.6 Conclusions
- •5.7 Further Reading
- •5.8 Questions
- •5.9 Exercises
- •References
- •6.1 Introduction
- •Chapter Outline
- •6.2 Registration of Two Views
- •6.2.1 Problem Statement
- •6.2.2 The Iterative Closest Points (ICP) Algorithm
- •6.2.3 ICP Extensions
- •6.2.3.1 Techniques for Pre-alignment
- •Global Approaches
- •Local Approaches
- •6.2.3.2 Techniques for Improving Speed
- •Subsampling
- •Closest Point Computation
- •Distance Formulation
- •6.2.3.3 Techniques for Improving Accuracy
- •Outlier Rejection
- •Additional Information
- •Probabilistic Methods
- •6.3 Advanced Techniques
- •6.3.1 Registration of More than Two Views
- •Reducing Error Accumulation
- •Automating Registration
- •6.3.2 Registration in Cluttered Scenes
- •Point Signatures
- •Matching Methods
- •6.3.3 Deformable Registration
- •Methods Based on General Optimization Techniques
- •Probabilistic Methods
- •6.3.4 Machine Learning Techniques
- •Improving the Matching
- •Object Detection
- •6.4 Quantitative Performance Evaluation
- •6.5 Case Study 1: Pairwise Alignment with Outlier Rejection
- •6.6 Case Study 2: ICP with Levenberg-Marquardt
- •6.6.1 The LM-ICP Method
- •6.6.2 Computing the Derivatives
- •6.6.3 The Case of Quaternions
- •6.6.4 Summary of the LM-ICP Algorithm
- •6.6.5 Results and Discussion
- •6.7 Case Study 3: Deformable ICP with Levenberg-Marquardt
- •6.7.1 Surface Representation
- •6.7.2 Cost Function
- •Data Term: Global Surface Attraction
- •Data Term: Boundary Attraction
- •Penalty Term: Spatial Smoothness
- •Penalty Term: Temporal Smoothness
- •6.7.3 Minimization Procedure
- •6.7.4 Summary of the Algorithm
- •6.7.5 Experiments
- •6.8 Research Challenges
- •6.9 Concluding Remarks
- •6.10 Further Reading
- •6.11 Questions
- •6.12 Exercises
- •References
- •7.1 Introduction
- •7.1.1 Retrieval and Recognition Evaluation
- •7.1.2 Chapter Outline
- •7.2 Literature Review
- •7.3 3D Shape Retrieval Techniques
- •7.3.1 Depth-Buffer Descriptor
- •7.3.1.1 Computing the 2D Projections
- •7.3.1.2 Obtaining the Feature Vector
- •7.3.1.3 Evaluation
- •7.3.1.4 Complexity Analysis
- •7.3.2 Spin Images for Object Recognition
- •7.3.2.1 Matching
- •7.3.2.2 Evaluation
- •7.3.2.3 Complexity Analysis
- •7.3.3 Salient Spectral Geometric Features
- •7.3.3.1 Feature Points Detection
- •7.3.3.2 Local Descriptors
- •7.3.3.3 Shape Matching
- •7.3.3.4 Evaluation
- •7.3.3.5 Complexity Analysis
- •7.3.4 Heat Kernel Signatures
- •7.3.4.1 Evaluation
- •7.3.4.2 Complexity Analysis
- •7.4 Research Challenges
- •7.5 Concluding Remarks
- •7.6 Further Reading
- •7.7 Questions
- •7.8 Exercises
- •References
- •8.1 Introduction
- •Chapter Outline
- •8.2 3D Face Scan Representation and Visualization
- •8.3 3D Face Datasets
- •8.3.1 FRGC v2 3D Face Dataset
- •8.3.2 The Bosphorus Dataset
- •8.4 3D Face Recognition Evaluation
- •8.4.1 Face Verification
- •8.4.2 Face Identification
- •8.5 Processing Stages in 3D Face Recognition
- •8.5.1 Face Detection and Segmentation
- •8.5.2 Removal of Spikes
- •8.5.3 Filling of Holes and Missing Data
- •8.5.4 Removal of Noise
- •8.5.5 Fiducial Point Localization and Pose Correction
- •8.5.6 Spatial Resampling
- •8.5.7 Feature Extraction on Facial Surfaces
- •8.5.8 Classifiers for 3D Face Matching
- •8.6 ICP-Based 3D Face Recognition
- •8.6.1 ICP Outline
- •8.6.2 A Critical Discussion of ICP
- •8.6.3 A Typical ICP-Based 3D Face Recognition Implementation
- •8.6.4 ICP Variants and Other Surface Registration Approaches
- •8.7 PCA-Based 3D Face Recognition
- •8.7.1 PCA System Training
- •8.7.2 PCA Training Using Singular Value Decomposition
- •8.7.3 PCA Testing
- •8.7.4 PCA Performance
- •8.8 LDA-Based 3D Face Recognition
- •8.8.1 Two-Class LDA
- •8.8.2 LDA with More than Two Classes
- •8.8.3 LDA in High Dimensional 3D Face Spaces
- •8.8.4 LDA Performance
- •8.9 Normals and Curvature in 3D Face Recognition
- •8.9.1 Computing Curvature on a 3D Face Scan
- •8.10 Recent Techniques in 3D Face Recognition
- •8.10.1 3D Face Recognition Using Annotated Face Models (AFM)
- •8.10.2 Local Feature-Based 3D Face Recognition
- •8.10.2.1 Keypoint Detection and Local Feature Matching
- •8.10.2.2 Other Local Feature-Based Methods
- •8.10.3 Expression Modeling for Invariant 3D Face Recognition
- •8.10.3.1 Other Expression Modeling Approaches
- •8.11 Research Challenges
- •8.12 Concluding Remarks
- •8.13 Further Reading
- •8.14 Questions
- •8.15 Exercises
- •References
- •9.1 Introduction
- •Chapter Outline
- •9.2 DEM Generation from Stereoscopic Imagery
- •9.2.1 Stereoscopic DEM Generation: Literature Review
- •9.2.2 Accuracy Evaluation of DEMs
- •9.2.3 An Example of DEM Generation from SPOT-5 Imagery
- •9.3 DEM Generation from InSAR
- •9.3.1 Techniques for DEM Generation from InSAR
- •9.3.1.1 Basic Principle of InSAR in Elevation Measurement
- •9.3.1.2 Processing Stages of DEM Generation from InSAR
- •The Branch-Cut Method of Phase Unwrapping
- •The Least Squares (LS) Method of Phase Unwrapping
- •9.3.2 Accuracy Analysis of DEMs Generated from InSAR
- •9.3.3 Examples of DEM Generation from InSAR
- •9.4 DEM Generation from LIDAR
- •9.4.1 LIDAR Data Acquisition
- •9.4.2 Accuracy, Error Types and Countermeasures
- •9.4.3 LIDAR Interpolation
- •9.4.4 LIDAR Filtering
- •9.4.5 DTM from Statistical Properties of the Point Cloud
- •9.5 Research Challenges
- •9.6 Concluding Remarks
- •9.7 Further Reading
- •9.8 Questions
- •9.9 Exercises
- •References
- •10.1 Introduction
- •10.1.1 Allometric Modeling of Biomass
- •10.1.2 Chapter Outline
- •10.2 Aerial Photo Mensuration
- •10.2.1 Principles of Aerial Photogrammetry
- •10.2.1.1 Geometric Basis of Photogrammetric Measurement
- •10.2.1.2 Ground Control and Direct Georeferencing
- •10.2.2 Tree Height Measurement Using Forest Photogrammetry
- •10.2.2.2 Automated Methods in Forest Photogrammetry
- •10.3 Airborne Laser Scanning
- •10.3.1 Principles of Airborne Laser Scanning
- •10.3.1.1 Lidar-Based Measurement of Terrain and Canopy Surfaces
- •10.3.2 Individual Tree-Level Measurement Using Lidar
- •10.3.2.1 Automated Individual Tree Measurement Using Lidar
- •10.3.3 Area-Based Approach to Estimating Biomass with Lidar
- •10.4 Future Developments
- •10.5 Concluding Remarks
- •10.6 Further Reading
- •10.7 Questions
- •References
- •11.1 Introduction
- •Chapter Outline
- •11.2 Volumetric Data Acquisition
- •11.2.1 Computed Tomography
- •11.2.1.1 Characteristics of 3D CT Data
- •11.2.2 Positron Emission Tomography (PET)
- •11.2.2.1 Characteristics of 3D PET Data
- •Relaxation
- •11.2.3.1 Characteristics of the 3D MRI Data
- •Image Quality and Artifacts
- •11.2.4 Summary
- •11.3 Surface Extraction and Volumetric Visualization
- •11.3.1 Surface Extraction
- •Example: Curvatures and Geometric Tools
- •11.3.2 Volume Rendering
- •11.3.3 Summary
- •11.4 Volumetric Image Registration
- •11.4.1 A Hierarchy of Transformations
- •11.4.1.1 Rigid Body Transformation
- •11.4.1.2 Similarity Transformations and Anisotropic Scaling
- •11.4.1.3 Affine Transformations
- •11.4.1.4 Perspective Transformations
- •11.4.1.5 Non-rigid Transformations
- •11.4.2 Points and Features Used for the Registration
- •11.4.2.1 Landmark Features
- •11.4.2.2 Surface-Based Registration
- •11.4.2.3 Intensity-Based Registration
- •11.4.3 Registration Optimization
- •11.4.3.1 Estimation of Registration Errors
- •11.4.4 Summary
- •11.5 Segmentation
- •11.5.1 Semi-automatic Methods
- •11.5.1.1 Thresholding
- •11.5.1.2 Region Growing
- •11.5.1.3 Deformable Models
- •Snakes
- •Balloons
- •11.5.2 Fully Automatic Methods
- •11.5.2.1 Atlas-Based Segmentation
- •11.5.2.2 Statistical Shape Modeling and Analysis
- •11.5.3 Summary
- •11.6 Diffusion Imaging: An Illustration of a Full Pipeline
- •11.6.1 From Scalar Images to Tensors
- •11.6.2 From Tensor Image to Information
- •11.6.3 Summary
- •11.7 Applications
- •11.7.1 Diagnosis and Morphometry
- •11.7.2 Simulation and Training
- •11.7.3 Surgical Planning and Guidance
- •11.7.4 Summary
- •11.8 Concluding Remarks
- •11.9 Research Challenges
- •11.10 Further Reading
- •Data Acquisition
- •Surface Extraction
- •Volume Registration
- •Segmentation
- •Diffusion Imaging
- •Software
- •11.11 Questions
- •11.12 Exercises
- •References
- •Index
7 3D Shape Matching for Retrieval and Recognition |
|
|
|
|
281 |
|||||||
N bins each, the cross-correlation can be used to measure their similarity |
||||||||||||
|
|
|
(N |
p |
|
|
i |
− |
|
|
i |
|
R(P , Q) |
= |
|
|
|
N |
pi qi |
pi |
qi |
|
(7.7) |
||
|
|
i |
− |
|
i − |
|
||||||
|
|
|
|
2 |
( |
p )2)(N |
q2 |
( q )2) |
It is easy to note that R is in the range [−1, 1] with high values when the spin images are similar and low values when they are not similar. However, there is a problem when we compare two spin images with this measure. Due to occlusions and cluttered scenes, many times a spin image contains more information than others so, to limit this effect, it is necessary to take only those ‘pixels’ where data exists. Since the cross-correlation depends of the number of pixels to compute it, the amount of overlap will have an effect on the comparison. Obviously, the confidence in the comparison is better when more pixels are used. In addition, the confidence in the match can be measured by the variance associated with the cross-correlation, so by combining both the cross-correlation R and its variance, we get a new similarity measure C, defined as:
|
|
|
|
|
|
− |
|
|
|
C(P , Q) = |
|
atanh R(P , Q) |
|
2 − λ |
|
1 |
|
|
(7.8) |
|
|
N |
|
3 |
where N is the number of overlapping pixels (pixels different from zero), λ is a constant and R is calculated using the N overlapping pixels. Note that the hyperbolic arctangent function transforms the correlation coefficient, R, into a distribution that has better statistical properties and the variance of this distribution is N1−3 . The measure, C, has a high value when the spin images are highly correlated and a large number of pixels overlap. In experiments, λ was configured to 3.
7.3.2.1 Matching
Given a scene, a random set of points is selected for matching. For each point, a set of correspondences is established using spin images from the scene and those calculated from models. Given a point from the scene, we calculate its spin image as previously described, thus this is compared with all the spin images in the huge collection using Eq. (7.8). From the comparison with the stored spin images, a histogram is built quantizing the similarity measure. This histogram maintains the information about occurrences of similarity measures when a comparison is being performed and it can be seen as a distribution of similarities between the input spin image and the stored ones. As we are interested in high similarity values, these can be found as outliers in the histogram.
In practice, outliers are found by automatically evaluating the histogram. A standard way to localize outliers is to determine the fourth spread of the histogram defined as the difference between the median of the largest N/2 measurements and the median of the smallest N/2 measurements. Let fs be the fourth spread, extreme outliers are 3fs units above the median of the largest N/2 measurements. Note that
282 |
B. Bustos and I. Sipiran |
with this method, the number of outliers can be greater than or equal to zero, so many correspondences per point can be found.
Once we have the set of correspondences for each point in the scene, we need to organize them in order to recognize the correct model object in the scene. As a large number of correspondences have been detected, it is necessary to filter them. Firstly, correspondences with a similarity measure of less than half of the maximum similarity are discarded. Secondly, by using geometric consistency, it is possible to eliminate bad correspondences. Given two correspondences C1 = (s1, m1) and C2 = (s2, m2), the geometric consistency is defined as follows
d |
|
(C |
, C |
) |
= |
2 |
Sm2 (m1) − Ss2 (s1) |
|
|
|
Sm2 (m1) + Ss2 (s1) |
|
|||||||
|
gc |
1 |
2 |
|
|
(7.9) |
|||
Dgc |
(C1 |
, C2) |
= max dgc(C1, C2), dgc(C2, C1) |
|
where SO(p) denotes the spin map function of point p using the local basis of point O, as defined in Eq. (7.2).
This geometric consistency measures the consistency in position and normals. Dgc is small when C1 and C2 are geometrically consistent. By using geometric consistency, correspondences which are not geometrically consistent with at least a quarter of the complete list of correspondences are eliminated. The final set of correspondences has a high probability of being correct, but it is still necessary to group and verify them.
Now, we group correspondences in order to calculate a good transformation and hence do the matching. A grouping measure is used which prioritizes correspon-
dences that are far apart. The grouping measure is defined as |
|
||
wgc(C1 |
, C2) = |
dgc(C1, C2) |
|
|
|
||
1 − exp(−( Sm2 (m1) + Ss2 (s1) )/2) |
(7.10) |
||
Wgc(C1 |
, C2) = max wgc(C1, C2), wgc(C2, C1) |
|
The same measure can also be defined between a correspondence C and a group
of correspondences {C1, C2, . . . , Cn} as follows |
|
|
Wgc C, {C1, C2, . . . , Cn} = maxi |
Wgc(C, Ci ) |
(7.11) |
Therefore, given a set of possible correspondences L = {C1, C2, . . . , Cn}, the following algorithm has to be used for generating groups:
•For each correspondence Ci L, initialize a group Gi = {Ci }
–Find a correspondence Cj L − Gi , such that Wgc(Cj , Gi ) is minimum. If Wgc(Cj , Gi ) < Tgc then update Gi = Gi {Cj }. Tgc is set between zero and one. If Tgc is small, only geometrically consistent correspondences remains. A commonly used value is 0.25.
–Continue until no more correspondences can be added.
7 3D Shape Matching for Retrieval and Recognition |
283 |
As a result, we have n groups, which are used as starting point for final matching. For each group of correspondences {(mi , si )}, a rigid transformation T is calculated by minimizing the following error using the least squares method:
ET = T |
|
si − T (mi ) |
2 |
(7.12) |
|
min |
|
|
|
|
|
|
|
|
|
where T (mi ) = R(mi ) + t, R and t are the rotation matrix and the translation vector, representing the rotation and position of the viewpoint si in the coordinate system of mi .
As a last step, each transformation needs to be verified in order to be validated as a matching. The model points in the correspondences are transformed using T . For each point in the scene and the correspondences set, we extend correspondences for neighboring points in both points of a correspondence under a distance constraint. A threshold distance equal to twice the mesh resolution is used. If the final set of correspondences is greater than a quarter or a third of the number of vertices of the model, the transformation is considered valid and the matching is accepted. Finally, with the final correspondences set, the transformation is refined by using an iterative closest point algorithm.
7.3.2.2 Evaluation
Unfortunately, in the original work by Johnson [56], the models used in experiments were obtained using a scanner and they are not available to date. Johnson and Hebert [57] presented results trying to measure the robustness of their method against clutter and occlusion. They built 100 scenes involving four shapes, using a range scanner. The experiments were based on querying an object in a scene and determining if the object was present or not. In addition, the levels of occlusion and clutter were also determined.
The four shapes were used in each scene, so the number of runs was 400. Interestingly, there were no errors at levels of occlusion under 70 % and the rate of recognition was above 90 % at 80 % of occlusion. In addition, the recognition rate was greater than 80 % at levels of clutter under 60 %.
On the other hand, spin images have recently been evaluated in the Robust Feature Detection and Description Benchmark [21] (SHREC 2010 track). In this track, the goal was to evaluate the robustness of the descriptors against mesh transformations such as isometry, topology, holes, micro holes, scale, local scale, sampling, noise, and shot noise. The dataset consisted of three shapes taken from the TOSCA dataset. Subsequently, several transformations, at different levels, were applied to each shape. The resulting dataset contained 138 shapes. In addition, a set of correspondences was available in order to measure the distance between descriptors over corresponding points.
284 |
|
|
|
|
B. Bustos and I. Sipiran |
|
Table 7.4 Robustness results for Spin Images. Table reproduced from Bronstein et al. [21] |
|
|||||
|
|
|
|
|
|
|
Transform. |
|
Strength |
|
|
|
|
|
|
|
|
|
|
|
|
1 |
≤2 |
≤3 |
≤4 |
≤5 |
|
Isometry |
0.12 |
0.10 |
0.10 |
0.10 |
0.10 |
|
Topology |
0.11 |
0.11 |
0.11 |
0.11 |
0.11 |
|
Holes |
0.12 |
0.12 |
0.12 |
0.12 |
0.12 |
|
Micro holes |
0.15 |
0.15 |
0.16 |
0.16 |
0.16 |
|
Scale |
0.18 |
0.15 |
0.15 |
0.15 |
0.15 |
|
Local scale |
0.12 |
0.13 |
0.14 |
0.15 |
0.17 |
|
Sampling |
0.13 |
0.13 |
0.13 |
0.13 |
0.15 |
|
Noise |
0.13 |
0.15 |
0.17 |
0.19 |
0.20 |
|
Shot noise |
0.11 |
0.13 |
0.16 |
0.17 |
0.18 |
|
Average |
0.13 |
0.13 |
0.14 |
0.14 |
0.15 |
|
|
|
|
|
|
|
|
The evaluation was performed using the normalized Euclidean distance, Q(X, Y ), between the descriptors of corresponding points of two shapes X and Y ,
Q(X, Y ) |
1 |
|F (X)| |
f (yk ) |
− g(xj ) 2 |
, |
(7.13) |
|||
|
= |
|
|
|
|
|
|
||
|
|F (X)| |
k |
= |
1 f (yk ) 2 + g(xj ) 2 |
|
|
|||
|
|
|
|
|
|
|
|
|
where (xj , yk ) are corresponding points, f (·) and g(·) are the descriptors of a point, and F (X) is the set of vertices to be considered. Here, we present the results obtained using F (X) = X.
The best results were obtained for isometry and topology transformations with 0.10 and 0.11 average distance respectively. This is because spin images were extracted locally, and these transformations do not modify the local structure of the mesh. On the other hand, the noise and shot noise transformations got higher distances (up to 0.20 and 0.18, respectively). It is clear that stronger levels of noise modify considerably the distribution of points on the surface, so spin images are not constructed robustly. See Tables 7.4 and 7.5 for the complete results. Table 7.5 shows the performance for dense heat kernel signatures calculated on 3D meshes. Clearly, regarding robustness, spin images show some drawbacks. However, an important aspect of this approach is its support to occlusion. In that sense, its application in recognition has been proved.
7.3.2.3 Complexity Analysis
Let S be a 3D object with n vertices. In addition, let W be the number of rows and columns of a resulting spin image (we assume square spin images for the analysis). The complexity of each stage is given as follows: