InterMulti:Multi-view Multimodal Interactions with Text-dominated Hierarchical High-order Fusion for Emotion Analysis
Feng Qiu, Wanzeng Kong, Yu Ding

TL;DR
InterMulti is a novel multimodal emotion analysis framework that captures complex interactions among speech, voice, and facial signals using hierarchical high-order fusion, significantly improving emotion recognition accuracy.
Contribution
The paper introduces a new hierarchical high-order fusion module that effectively integrates multimodal signals for emotion analysis, outperforming existing methods.
Findings
Outperforms state-of-the-art on MOSEI, MOSI, and IEMOCAP datasets.
Effectively captures complex multimodal interactions.
Balances modality contributions for improved emotion recognition.
Abstract
Humans are sophisticated at reading interlocutors' emotions from multimodal signals, such as speech contents, voice tones and facial expressions. However, machines might struggle to understand various emotions due to the difficulty of effectively decoding emotions from the complex interactions between multimodal signals. In this paper, we propose a multimodal emotion analysis framework, InterMulti, to capture complex multimodal interactions from different views and identify emotions from multimodal signals. Our proposed framework decomposes signals of different modalities into three kinds of multimodal interaction representations, including a modality-full interaction representation, a modality-shared interaction representation, and three modality-specific interaction representations. Additionally, to balance the contribution of different modalities and learn a more informative latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining
