Learning Face Representation from Scratch
Dong Yi, Zhen Lei, Shengcai Liao, and Stan Z. Li

TL;DR
This paper introduces a large-scale face dataset called CASIAWebFace and demonstrates that training an 11-layer CNN on this data achieves state-of-the-art face recognition accuracy, emphasizing data importance.
Contribution
It presents a semi-automatic method to collect a large face dataset and shows that training CNNs on this data yields high recognition performance.
Findings
Achieved state-of-the-art accuracy on LFW and YTF datasets.
Built a large dataset with 10,000 subjects and 500,000 images.
Demonstrated the importance of large-scale data in face recognition.
Abstract
Pushing by big data and deep convolutional neural network (CNN), the performance of face recognition is becoming comparable to human. Using private large scale training datasets, several groups achieve very high performance on LFW, i.e., 97% to 99%. While there are many open source implementations of CNN, none of large scale face dataset is publicly available. The current situation in the field of face recognition is that data is more important than algorithm. To solve this problem, this paper proposes a semi-automatical way to collect face images from Internet and builds a large scale dataset containing about 10,000 subjects and 500,000 images, called CASIAWebFace. Based on the database, we use a 11-layer CNN to learn discriminative representation and obtain state-of-theart accuracy on LFW and YTF. The publication of CASIAWebFace will attract more research groups entering this field…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition · Advanced Image and Video Retrieval Techniques
