Добавил:
Опубликованный материал нарушает ваши авторские права? Сообщите нам.
Вуз: Предмет: Файл:
[2.1] 3D Imaging, Analysis and Applications-Springer-Verlag London (2012).pdf
Скачиваний:
12
Добавлен:
11.12.2021
Размер:
12.61 Mб
Скачать

8 3D Face Recognition

321

8.4 3D Face Recognition Evaluation

The evaluation of a 3D face recognition system depends on the application that it is intended for. In general, a face recognition system operates in either a verification application, which requires one-to-one matching, or an identification application, which requires one-to-many matching. We discuss each of these in turn in the following subsections.

8.4.1 Face Verification

In a verification application, the face recognition system must supply a binary accept or reject decision, as a response to the subject’s claimed identity, associated with one of a stored set of gallery scans. It does this by generating a match score between the captured 3D data (the probe or query) and the gallery data of the subject’s claimed identity. Often this is implemented as a distance metric between feature vectors, such as the Euclidean distance, cosine distance or Mahalanobis distance (in the case of multiple images of the same person in the gallery). A low score on the distance metric indicates a close match and application of a suitable threshold generates the accept/reject decision. Verification systems are mainly used by authorized individuals, who want to gain their rightful access to a building or computer system. Consequently, they will be cooperative when adopting a neutral expression and frontal pose at favorable distance from the camera. For cooperative scenarios, datasets such as FRGC v2 provide suitable training and validation datasets to perform verification tests.

In order to evaluate a verification system, a large number of verification tests need to be performed, where the subject’s identity associated with the test 3D capture is known, so that it can be established whether the accept/reject decision was correct. This identity can be extracted from the filename of a 3D face capture within a dataset by means of a unique subject identifier.

A key point is that the accept/reject decision is threshold dependent and it is desirable to explore the system performance over a wide range of such thresholds. Given that a set of 3D face scans is available, all with known identities, and there are at least two scans of each subject, we now describe a way of implementing this. Every image in the set is compared with every other, excluding itself, and a match score is formed. Two lists are then formed, one containing matches of the same subject identity and the other containing matches of different subjects. We then vary the threshold from zero so that all decisions are reject, to the maximum score value, so that all decisions are accept. For each threshold value, we examine the two lists and count the number of reject decisions from the same identity (SI) list to form a false rejection rate (FRR) as a percentage of the SI list size and we count the number of accept decisions in the different identity (DI) list to form a false acceptance rate (FAR) as a percentage of the DI list size. Ideally both FAR and FRR would be both zero and this describes a perfect system performance. However, in reality, verification systems are not perfect and both false accepts and false rejects exist. False

322

A. Mian and N. Pears

accepts can be reduced by decreasing the threshold but this increases false rejects and vice-versa. A receiver operating characteristic or ROC curve, as defined in biometric verification tests, is a plot of FRR against FAR for all thresholds, thus giving a visualization of the tradeoff between these two performance metrics. Depending on the dataset size and the number of scans per person, the SI and DI list sizes can be very different, with DI usually much larger than SI. This has the implication that values on the FRR axis are noisier than on the FAR axis.

In order to measure system performance using a ROC curve we can use the concept of an equal error rate (EER), where FAR = FRR, and a lower value indicates a better performance for a given face verification system. Given that a false accept is often a worse decision with a higher penalty than a false reject, it is common practise to set a suitably low FAR and then performance is indicated by either the FRR, the lower the better, or the true accept rate, TAR = 1 FRR, the higher the better. TAR is commonly known as the verification rate and is often expressed as a percentage. In FRGC benchmark verification tests this FAR is set at 0.001 (0.1 %).

At the time of writing, high performance 3D face recognition systems are reporting verification rates that typically range from around 96.5 % to 98.5 % (to the nearest half percent) at 0.1 % FAR on the full FRGC v2 dataset [51, 64, 76]. This is a significant performance improvement of around 20 % on PCA-based baseline results [71]. It is reasonable to assume that, in many verification scenarios, the subject will be cooperating with a neutral expression. Verification rates for probes with neutral expressions only range from around 98.5 % to 99.5 % [51, 64] at 0.1 % FAR.

8.4.2 Face Identification

In face identification, the probe (query) 3D capture is matched to a stored gallery of 3D captures (‘models’), with known identifier labels, and a set of matching scores is generated. Thus identification requires a one-to-many match process in contrast to verification’s one-to-one match. The match with the highest match score (or, equivalently, lowest distance metric) provides the identity of the probe. If the closest match is not close enough by application of some threshold, the system may return a null response, indicating that the probe does not match to any subject in the gallery. This is a more difficult problem than verification, since, if there are 1000 subjects in the gallery, the system has to provide the correct response in 1001 possible responses (including the null response).

In order to test how good a 3D face recognition system is at identification, a large number of identification tests are performed, with no 3D model being compared to itself, and we determine the percentage of correct identifications. This gives the rank-1 identification rate, which means that the match is taken as the best (rank-1) score achieved when matching to the gallery. However, we can imagine real-world identification scenarios where the rank-1 identification test is too severe and where we may be interested in a wider performance metric. One such scenario is the watch list, where 3D models of a small number of known criminals may be stored in the