Multi Teacher Privileged Knowledge Distillation for Multimodal   Expression Recognition

Muhammad Haseeb Aslam; Marco Pedersoli; Alessandro Lameiras Koerich,; Eric Granger

arXiv:2408.09035·cs.CV·August 20, 2024

Multi Teacher Privileged Knowledge Distillation for Multimodal Expression Recognition

Muhammad Haseeb Aslam, Marco Pedersoli, Alessandro Lameiras Koerich,, Eric Granger

PDF

Open Access 1 Repo

TL;DR

This paper introduces a multi-teacher privileged knowledge distillation method using structural similarity and optimal transport to improve multimodal emotion recognition, outperforming existing PKD approaches.

Contribution

It proposes a novel multi-teacher PKD framework with structural similarity and optimal transport, enhancing robustness and accuracy in multimodal emotion recognition.

Findings

01

Outperforms state-of-the-art PKD methods on Affwild2 and Biovid datasets.

02

Improves visual-only baseline accuracy by 5.5% on Biovid.

03

Enhances valence and arousal prediction accuracy by 3% and 5% respectively.

Abstract

Human emotion is a complex phenomenon conveyed and perceived through facial expressions, vocal tones, body language, and physiological signals. Multimodal emotion recognition systems can perform well because they can learn complementary and redundant semantic information from diverse sensors. In real-world scenarios, only a subset of the modalities employed for training may be available at test time. Learning privileged information allows a model to exploit data from additional modalities that are only available during training. SOTA methods for PKD have been proposed to distill information from a teacher model (with privileged modalities) to a student model (without privileged modalities). However, such PKD methods utilize point-to-point matching and do not explicitly capture the relational information. Recently, methods have been proposed to distill the structural information.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

haseebaslam95/MT-PKDOT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Subtitles and Audiovisual Media · Hand Gesture Recognition Systems

MethodsALIGN