Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition
Muhammad Haseeb Aslam, Marco Pedersoli, Alessandro Lameiras Koerich,, Eric Granger

TL;DR
This paper introduces a multi-teacher privileged knowledge distillation method using structural similarity and optimal transport to improve multimodal emotion recognition, outperforming existing PKD approaches.
Contribution
It proposes a novel multi-teacher PKD framework with structural similarity and optimal transport, enhancing robustness and accuracy in multimodal emotion recognition.
Findings
Outperforms state-of-the-art PKD methods on Affwild2 and Biovid datasets.
Improves visual-only baseline accuracy by 5.5% on Biovid.
Enhances valence and arousal prediction accuracy by 3% and 5% respectively.
Abstract
Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signals. Multimodal emotion recognition systems can perform well because they can learn complementary and redundant semantic information from diverse sensors. In real-world scenarios, only a subset of the modalities employed for training may be available at test time. Learning privileged information allows a model to exploit data from additional modalities that are only available during training. SOTA methods for PKD have been proposed to distill information from a teacher model (with privileged modalities) to a student model (without privileged modalities). However, such PKD methods utilize point-to-point matching and do not explicitly capture the relational information. Recently, methods have been proposed to distill the structural information.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems · Subtitles and Audiovisual Media · Hand Gesture Recognition Systems
MethodsALIGN
