Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis

Menghua Jiang; Yuxia Lin; Baoliang Chen; Haifeng Hu; Yuncheng Jiang; Sijie Mai

arXiv:2508.04999·cs.LG·May 21, 2026

Disentangling Bias by Modeling Intra- and Inter-modal Causal Attention for Multimodal Sentiment Analysis

Menghua Jiang, Yuxia Lin, Baoliang Chen, Haifeng Hu, Yuncheng Jiang, Sijie Mai

PDF

TL;DR

This paper introduces a causal intervention framework for multimodal sentiment analysis that disentangles true causal signals from spurious correlations across modalities, enhancing model robustness.

Contribution

The proposed MMCI framework models intra- and inter-modal dependencies as a multi-relational graph and applies causal intervention to improve generalization in MSA.

Findings

01

Improves performance on standard MSA datasets

02

Effectively suppresses biases and spurious correlations

03

Enhances robustness under distribution shifts

Abstract

Multimodal sentiment analysis (MSA) aims to understand human emotions by integrating information from multiple modalities, such as text, audio, and visual data. However, existing methods often suffer from spurious correlations both within and across modalities, leading models to rely on statistical shortcuts rather than true causal relationships, thereby undermining generalization. To mitigate this issue, we propose a Multi-relational Multimodal Causal Intervention (MMCI) framework, which leverages the backdoor adjustment from causal theory to address the confounding effects of such shortcuts. Specifically, we first model the multimodal inputs as a multi-relational graph to explicitly capture intra- and inter-modal dependencies. Then, we apply an attention mechanism to separately estimate and disentangle the causal features and shortcut features corresponding to these intra- and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.