Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection

Dingkang Yang; Mingcheng Li; Xuecheng Wu; Zhaoyu Chen; Kaixun Jiang; Keliang Liu; Peng Zhai; Lihua Zhang

arXiv:2511.06328·cs.CV·April 2, 2026

Improving Multimodal Sentiment Analysis via Modality Optimization and Dynamic Primary Modality Selection

Dingkang Yang, Mingcheng Li, Xuecheng Wu, Zhaoyu Chen, Kaixun Jiang, Keliang Liu, Peng Zhai, Lihua Zhang

PDF

TL;DR

This paper introduces MODS, a framework for multimodal sentiment analysis that dynamically selects primary modalities and reduces noise, leading to improved performance across benchmark datasets.

Contribution

The paper proposes a novel framework with a graph-based compressor, adaptive modality selector, and cross-attention module for better multimodal sentiment analysis.

Findings

01

MODS outperforms existing methods on four benchmark datasets.

02

Dynamic modality selection improves sentiment prediction accuracy.

03

Redundancy reduction enhances the quality of multimodal representations.

Abstract

Multimodal Sentiment Analysis (MSA) aims to predict sentiment from language, acoustic, and visual data in videos. However, imbalanced unimodal performance often leads to suboptimal fused representations. Existing approaches typically adopt fixed primary modality strategies to maximize dominant modality advantages, yet fail to adapt to dynamic variations in modality importance across different samples. Moreover, non-language modalities suffer from sequential redundancy and noise, degrading model performance when they serve as primary inputs. To address these issues, this paper proposes a modality optimization and dynamic primary modality selection framework (MODS). First, a Graph-based Dynamic Sequence Compressor (GDC) is constructed, which employs capsule networks and graph convolution to reduce sequential redundancy in acoustic/visual modalities. Then, we develop a sample-adaptive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.