Debiased Multimodal Understanding for Human Language Sequences

Zhi Xu; Dingkang Yang; Mingcheng Li; Yuzheng Wang; Zhaoyu Chen; Jiawei; Chen; Jinjie Wei; Lihua Zhang

arXiv:2403.05025·cs.AI·December 16, 2024·3 cites

Debiased Multimodal Understanding for Human Language Sequences

Zhi Xu, Dingkang Yang, Mingcheng Li, Yuzheng Wang, Zhaoyu Chen, Jiawei, Chen, Jinjie Wei, Lihua Zhang

PDF

Open Access 1 Video

TL;DR

This paper introduces a causal intervention module called SuCI to mitigate subject bias in multimodal language understanding, improving model generalization across diverse subjects.

Contribution

It formulates a causal graph for MLU, identifies subject bias as a confounder, and proposes SuCI to disentangle and remove this bias for more unbiased predictions.

Findings

01

SuCI improves performance on multiple MLU benchmarks.

02

The method effectively reduces subject-specific biases.

03

Results demonstrate enhanced generalization to new subjects.

Abstract

Human multimodal language understanding (MLU) is an indispensable component of expression analysis (e.g., sentiment or humor) from heterogeneous modalities, including visual postures, linguistic contents, and acoustic behaviours. Existing works invariably focus on designing sophisticated structures or fusion strategies to achieve impressive improvements. Unfortunately, they all suffer from the subject variation problem due to data distribution discrepancies among subjects. Concretely, MLU models are easily misled by distinct subjects with different expression customs and characteristics in the training data to learn subject-specific spurious correlations, limiting performance and generalizability across new subjects. Motivated by this observation, we introduce a recapitulative causal graph to formulate the MLU procedure and analyze the confounding effect of subjects. Then, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Debiased Multimodal Understanding for Human Language Sequences· underline

Taxonomy

TopicsNatural Language Processing Techniques · Intelligent Tutoring Systems and Adaptive Learning

MethodsFocus