On Modality Bias Recognition and Reduction
Yangyang Guo, Liqiang Nie, Harry Cheng, Zhiyong Cheng, Mohan, Kankanhalli, Alberto Del Bimbo

TL;DR
This paper systematically studies modality bias in multi-modal classification, highlighting how spurious correlations cause dominance of certain modalities, and proposes a loss function to mitigate this bias, improving model performance.
Contribution
It introduces a comprehensive analysis of modality bias, creates new OoD datasets for evaluation, and proposes a plug-and-play loss to reduce bias and enhance multi-modal learning.
Findings
The proposed method improves performance across multiple datasets.
Existing methods suffer from modality bias in OoD settings.
The loss function effectively reduces modality dominance in models.
Abstract
Making each modality in multi-modal data contribute is of vital importance to learning a versatile multi-modal model. Existing methods, however, are often dominated by one or few of modalities during model training, resulting in sub-optimal performance. In this paper, we refer to this problem as modality bias and attempt to study it in the context of multi-modal classification systematically and comprehensively. After stepping into several empirical analysis, we recognize that one modality affects the model prediction more just because this modality has a spurious correlation with instance labels. In order to primarily facilitate the evaluation on the modality bias problem, we construct two datasets respectively for the colored digit recognition and video action recognition tasks in line with the Out-of-Distribution (OoD) protocol. Collaborating with the benchmarks in the visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
