Teacher-Student Training and Triplet Loss for Facial Expression Recognition under Occlusion
Mariana-Iuliana Georgescu, Radu Tudor Ionescu

TL;DR
This paper introduces a novel knowledge distillation approach using triplet loss for facial expression recognition under occlusion, significantly improving accuracy on challenging VR scenarios.
Contribution
It proposes a new triplet loss-based knowledge distillation method and combines it with classic teacher-student training for better occluded face recognition.
Findings
Significant accuracy improvements on FER+ and AffectNet datasets.
Effective handling of 50% occlusion in facial expression recognition.
Outperforms existing state-of-the-art methods for occluded faces.
Abstract
In this paper, we study the task of facial expression recognition under strong occlusion. We are particularly interested in cases where 50% of the face is occluded, e.g. when the subject wears a Virtual Reality (VR) headset. While previous studies show that pre-training convolutional neural networks (CNNs) on fully-visible (non-occluded) faces improves the accuracy, we propose to employ knowledge distillation to achieve further improvements. First of all, we employ the classic teacher-student training strategy, in which the teacher is a CNN trained on fully-visible faces and the student is a CNN trained on occluded faces. Second of all, we propose a new approach for knowledge distillation based on triplet loss. During training, the goal is to reduce the distance between an anchor embedding, produced by a student CNN that takes occluded faces as input, and a positive embedding (from the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsKnowledge Distillation · Triplet Loss
