Decoupled Hierarchical Distillation for Multimodal Emotion Recognition
Yong Li, Yuanzhi Wang, Yi Ding, Shiqing Zhang, Ke Lu, Cuntai Guan

TL;DR
This paper introduces a hierarchical distillation framework for multimodal emotion recognition that decouples features into homogeneous and heterogeneous components, enabling effective cross-modal knowledge transfer and improved recognition accuracy.
Contribution
The proposed Decoupled Hierarchical Multimodal Distillation (DHMD) framework innovatively combines feature decoupling with two-stage knowledge distillation to enhance multimodal emotion recognition.
Findings
DHMD outperforms state-of-the-art MER methods on CMU-MOSI and CMU-MOSEI datasets.
The framework achieves up to 2.4% accuracy improvement over existing methods.
Visualization shows meaningful distribution patterns in modality-irrelevant and exclusive feature spaces.
Abstract
Human multimodal emotion recognition (MER) seeks to infer human emotions by integrating information from language, visual, and acoustic modalities. Although existing MER approaches have achieved promising results, they still struggle with inherent multimodal heterogeneities and varying contributions from different modalities. To address these challenges, we propose a novel framework, Decoupled Hierarchical Multimodal Distillation (DHMD). DHMD decouples each modality's features into modality-irrelevant (homogeneous) and modality-exclusive (heterogeneous) components using a self-regression mechanism. The framework employs a two-stage knowledge distillation (KD) strategy: (1) coarse-grained KD via a Graph Distillation Unit (GD-Unit) in each decoupled feature space, where a dynamic graph facilitates adaptive distillation among modalities, and (2) fine-grained KD through a cross-modal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Face and Expression Recognition
