Complementarity-Supervised Spectral-Band Routing for Multimodal Emotion Recognition
Zhexian Huang, Bo Zhao, Hui Ma, Zhishu Liu, Jie Zhang, Ruixin Zhang, Shouhong Ding, Zitong Yu

TL;DR
This paper introduces Atsuko, a novel multimodal emotion recognition model that decomposes features into frequency bands and uses a complementarity-guided routing mechanism to improve fusion of heterogeneous modalities.
Contribution
It proposes a multi-scale band decomposition and a complementarity-supervised routing framework for more effective multimodal emotion recognition.
Findings
Achieves superior performance on multiple emotion recognition benchmarks.
Effectively models fine-grained cross-modal interactions.
Mitigates dominance of certain modalities through complementarity supervision.
Abstract
Multimodal emotion recognition fuses cues such as text, video, and audio to understand individual emotional states. Prior methods face two main limitations: mechanically relying on independent unimodal performance, thereby missing genuine complementary contributions, and coarse-grained fusion conflicting with the fine-grained representations required by emotion tasks. As inconsistent information density across heterogeneous modalities hinders inter-modal feature mining, we propose the Complementarity-Supervised Multi-Band Expert Network, named Atsuko, to model fine-grained complementary features via multi-scale band decomposition and expert collaboration. Specifically, we orthogonally decompose each modality's features into high, mid, and low-frequency components. Building upon this band-level routing, we design a modality-level router with a dual-path mechanism for fine-grained…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Face and Expression Recognition · Music and Audio Processing
