Robust Face Recognition via Multimodal Deep Face Representation
Changxing Ding, Dacheng Tao

TL;DR
This paper introduces a multimodal deep learning framework combining CNNs and auto-encoders to improve face recognition accuracy under challenging conditions, achieving over 99% accuracy on LFW.
Contribution
It presents a novel multimodal deep learning architecture that effectively fuses features from multiple data sources for robust face recognition.
Findings
Achieved 98.43% verification rate on LFW with a single CNN.
Surpassed 99.0% recognition rate on LFW using a small ensemble system.
Demonstrated robustness to pose, illumination, and expression variations.
Abstract
Face images appeared in multimedia applications, e.g., social networks and digital entertainment, usually exhibit dramatic pose, illumination, and expression variations, resulting in considerable performance degradation for traditional face recognition algorithms. This paper proposes a comprehensive deep learning framework to jointly learn face representation using multimodal information. The proposed deep learning structure is composed of a set of elaborately designed convolutional neural networks (CNNs) and a three-layer stacked auto-encoder (SAE). The set of CNNs extracts complementary facial features from multimodal data. Then, the extracted features are concatenated to form a high-dimensional feature vector, whose dimension is compressed by SAE. All the CNNs are trained using a subset of 9,000 subjects from the publicly available CASIA-WebFace database, which ensures the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
