Evaluating and Steering Modality Preferences in Multimodal Large Language Model
Yu Zhang, Jinlong Ma, Yongshuai Hou, Xuefeng Bai, Kehai Chen, Yang Xiang, Jun Yu, Min Zhang

TL;DR
This paper investigates modality preferences in multimodal large language models, introduces a benchmark to evaluate these preferences, and proposes a method to steer and control them to improve task performance.
Contribution
It introduces the MC extsuperscript{2} benchmark for evaluating modality preference and proposes a novel representation engineering method to steer these preferences without fine-tuning.
Findings
All tested MLLMs show clear modality preferences.
Modality preference correlates with downstream task performance.
The proposed steering method effectively controls modality preference.
Abstract
Multi-modal large language models (MLLMs) have achieved remarkable success on complex multi-modal tasks. However, it remains insufficiently explored whether they exhibit , a tendency to favor one modality over another when processing multi-modal contexts. To study this question, we introduce benchmark, which constructs controlled evidence-conflict scenarios to systematically evaluate modality preference in decision-making. Extensive experiments reveal that all 20 tested MLLMs generally demonstrate clear modality preferences, and such preferences can serve as a useful indicator of downstream task performance of MLLMs. Further analysis shows that modality preference can be controlled by instruction guidance and captured within the latent representations of MLLMs. Built on these insights, we propose a probing and steering…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
