Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Yuzhe Weng, Haotian Wang, Tian Gao, Kewei Li, Shutong Niu, Jun Du

TL;DR
This paper introduces a novel self-distillation framework with cross-attention and autoencoder modules to improve multimodal sentiment analysis, especially when text data is missing, achieving superior results on CMU-MOSEI.
Contribution
The study presents a new Double-Flow Self-Distillation Framework with UMCA and MIA modules that effectively handle missing text modality in sentiment analysis.
Findings
Outperforms existing models on CMU-MOSEI when text is missing
Uses LLM-based model to simulate text representations from audio
Introduces RNC loss for better alignment of representations
Abstract
In multimodal sentiment analysis, collecting text data is often more challenging than video or audio due to higher annotation costs and inconsistent automatic speech recognition (ASR) quality. To address this challenge, our study has developed a robust model that effectively integrates multimodal sentiment information, even in the absence of text modality. Specifically, we have developed a Double-Flow Self-Distillation Framework, including Unified Modality Cross-Attention (UMCA) and Modality Imagination Autoencoder (MIA), which excels at processing both scenarios with complete modalities and those with missing text modality. In detail, when the text modality is missing, our framework uses the LLM-based model to simulate the text representation from the audio modality, while the MIA module supplements information from the other two modalities to make the simulated text representation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSentiment Analysis and Opinion Mining
MethodsALIGN · Masked autoencoder
