Self-supervised Contrastive Learning of Multi-view Facial Expressions
Shuvendu Roy, Ali Etemad

TL;DR
This paper introduces CL-MEx, a self-supervised contrastive learning framework that improves multi-view facial expression recognition, especially for non-frontal faces, by learning view-invariant embeddings before supervised fine-tuning.
Contribution
It proposes a novel two-step training method combining self-supervised contrastive learning with supervised fine-tuning for multi-view FER.
Findings
Achieves state-of-the-art results on KDEF and DDCF datasets.
Demonstrates robustness to challenging angles.
Effective with reduced labeled data.
Abstract
Facial expression recognition (FER) has emerged as an important component of human-computer interaction systems. Despite recent advancements in FER, performance often drops significantly for non-frontal facial images. We propose Contrastive Learning of Multi-view facial Expressions (CL-MEx) to exploit facial images captured simultaneously from different angles towards FER. CL-MEx is a two-step training framework. In the first step, an encoder network is pre-trained with the proposed self-supervised contrastive loss, where it learns to generate view-invariant embeddings for different views of a subject. The model is then fine-tuned with labeled data in a supervised setting. We demonstrate the performance of the proposed method on two multi-view FER datasets, KDEF and DDCF, where state-of-the-art performances are achieved. Further experiments show the robustness of our method in dealing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
