Unimodal Face Classification with Multimodal Training
Wenbin Teng, Chongyang Bai

TL;DR
This paper introduces a Multimodal Training Unimodal Test (MTUT) framework that leverages cross-modality relationships during training to improve the robustness of face classification when only a single modality is available during testing.
Contribution
The novel MTUT framework uses intra- and cross-modality autoencoders with a divergence loss to learn robust unimodal embeddings from multimodal training data.
Findings
Outperforms ten baseline methods on 2D and 3D face datasets.
Effective in handling noisy 2D images and 3D face data.
Demonstrates robustness across different modalities and conditions.
Abstract
Face recognition is a crucial task in various multimedia applications such as security check, credential access and motion sensing games. However, the task is challenging when an input face is noisy (e.g. poor-condition RGB image) or lacks certain information (e.g. 3D face without color). In this work, we propose a Multimodal Training Unimodal Test (MTUT) framework for robust face classification, which exploits the cross-modality relationship during training and applies it as a complementary of the imperfect single modality input during testing. Technically, during training, the framework (1) builds both intra-modality and cross-modality autoencoders with the aid of facial attributes to learn latent embeddings as multimodal descriptors, (2) proposes a novel multimodal embedding divergence loss to align the heterogeneous features from different modalities, which also adaptively avoids…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition · Biometric Identification and Security
