# The Do's and Don'ts for CNN-based Face Verification

**Authors:** Ankan Bansal, Carlos Castillo, Rajeev Ranjan, Rama Chellappa

arXiv: 1705.07426 · 2017-09-08

## TL;DR

This paper investigates key questions in CNN-based face verification, such as training on still images versus videos, dataset design, label noise effects, and face alignment, using multiple datasets including a new large-scale video dataset.

## Contribution

It provides empirical insights into face recognition challenges and introduces a new extensive video dataset for training and evaluation.

## Key findings

- Training on still images can generalize to videos.
- Deeper datasets do not always outperform wider ones.
-  Label noise can sometimes improve network performance.

## Abstract

While the research community appears to have developed a consensus on the methods of acquiring annotated data, design and training of CNNs, many questions still remain to be answered. In this paper, we explore the following questions that are critical to face recognition research: (i) Can we train on still images and expect the systems to work on videos? (ii) Are deeper datasets better than wider datasets? (iii) Does adding label noise lead to improvement in performance of deep networks? (iv) Is alignment needed for face recognition? We address these questions by training CNNs using CASIA-WebFace, UMDFaces, and a new video dataset and testing on YouTube- Faces, IJB-A and a disjoint portion of UMDFaces datasets. Our new data set, which will be made publicly available, has 22,075 videos and 3,735,476 human annotated frames extracted from them.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1705.07426/full.md

## Figures

19 figures with captions in the complete paper: https://tomesphere.com/paper/1705.07426/full.md

## References

41 references — full list in the complete paper: https://tomesphere.com/paper/1705.07426/full.md

---
Source: https://tomesphere.com/paper/1705.07426