Debiased Multimodal Understanding for Human Language Sequences
Zhi Xu, Dingkang Yang, Mingcheng Li, Yuzheng Wang, Zhaoyu Chen, Jiawei, Chen, Jinjie Wei, Lihua Zhang

TL;DR
This paper introduces a causal intervention module called SuCI to mitigate subject bias in multimodal language understanding, improving model generalization across diverse subjects.
Contribution
It formulates a causal graph for MLU, identifies subject bias as a confounder, and proposes SuCI to disentangle and remove this bias for more unbiased predictions.
Findings
SuCI improves performance on multiple MLU benchmarks.
The method effectively reduces subject-specific biases.
Results demonstrate enhanced generalization to new subjects.
Abstract
Human multimodal language understanding (MLU) is an indispensable component of expression analysis (e.g., sentiment or humor) from heterogeneous modalities, including visual postures, linguistic contents, and acoustic behaviours. Existing works invariably focus on designing sophisticated structures or fusion strategies to achieve impressive improvements. Unfortunately, they all suffer from the subject variation problem due to data distribution discrepancies among subjects. Concretely, MLU models are easily misled by distinct subjects with different expression customs and characteristics in the training data to learn subject-specific spurious correlations, limiting performance and generalizability across new subjects. Motivated by this observation, we introduce a recapitulative causal graph to formulate the MLU procedure and analyze the confounding effect of subjects. Then, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning
MethodsFocus
