Evaluating and Steering Modality Preferences in Multimodal Large Language Model

Yu Zhang; Jinlong Ma; Yongshuai Hou; Xuefeng Bai; Kehai Chen; Yang Xiang; Jun Yu; Min Zhang

arXiv:2505.20977·cs.CL·February 5, 2026

Evaluating and Steering Modality Preferences in Multimodal Large Language Model

Yu Zhang, Jinlong Ma, Yongshuai Hou, Xuefeng Bai, Kehai Chen, Yang Xiang, Jun Yu, Min Zhang

PDF

Open Access 1 Datasets

TL;DR

This paper investigates modality preferences in multimodal large language models, introduces a benchmark to evaluate these preferences, and proposes a method to steer and control them to improve task performance.

Contribution

It introduces the MC extsuperscript{2} benchmark for evaluating modality preference and proposes a novel representation engineering method to steer these preferences without fine-tuning.

Findings

01

All tested MLLMs show clear modality preferences.

02

Modality preference correlates with downstream task performance.

03

The proposed steering method effectively controls modality preference.

Abstract

Multi-modal large language models (MLLMs) have achieved remarkable success on complex multi-modal tasks. However, it remains insufficiently explored whether they exhibit $modality preference$ , a tendency to favor one modality over another when processing multi-modal contexts. To study this question, we introduce $MC \textsuperscript 2$ benchmark, which constructs controlled evidence-conflict scenarios to systematically evaluate modality preference in decision-making. Extensive experiments reveal that all 20 tested MLLMs generally demonstrate clear modality preferences, and such preferences can serve as a useful indicator of downstream task performance of MLLMs. Further analysis shows that modality preference can be controlled by instruction guidance and captured within the latent representations of MLLMs. Built on these insights, we propose a probing and steering…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

271754echo/MC2
dataset· 25 dl
25 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling