Enhancing Multimodal Sentiment Analysis for Missing Modality through   Self-Distillation and Unified Modality Cross-Attention

Yuzhe Weng; Haotian Wang; Tian Gao; Kewei Li; Shutong Niu; Jun Du

arXiv:2410.15029·cs.CL·March 25, 2025

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

Yuzhe Weng, Haotian Wang, Tian Gao, Kewei Li, Shutong Niu, Jun Du

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel self-distillation framework with cross-attention and autoencoder modules to improve multimodal sentiment analysis, especially when text data is missing, achieving superior results on CMU-MOSEI.

Contribution

The study presents a new Double-Flow Self-Distillation Framework with UMCA and MIA modules that effectively handle missing text modality in sentiment analysis.

Findings

01

Outperforms existing models on CMU-MOSEI when text is missing

02

Uses LLM-based model to simulate text representations from audio

03

Introduces RNC loss for better alignment of representations

Abstract

In multimodal sentiment analysis, collecting text data is often more challenging than video or audio due to higher annotation costs and inconsistent automatic speech recognition (ASR) quality. To address this challenge, our study has developed a robust model that effectively integrates multimodal sentiment information, even in the absence of text modality. Specifically, we have developed a Double-Flow Self-Distillation Framework, including Unified Modality Cross-Attention (UMCA) and Modality Imagination Autoencoder (MIA), which excels at processing both scenarios with complete modalities and those with missing text modality. In detail, when the text modality is missing, our framework uses the LLM-based model to simulate the text representation from the audio modality, while the MIA module supplements information from the other two modalities to make the simulated text representation…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

warmcongee/sdumc
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSentiment Analysis and Opinion Mining

MethodsALIGN · Masked autoencoder