Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

Xinru Yan; Boxi Cao; Yaojie Lu; Hongyu Lin; Weixiang Zhou; Le Sun; Xianpei Han

arXiv:2604.16902·cs.AI·April 30, 2026

Beyond Text-Dominance: Understanding Modality Preference of Omni-modal Large Language Models

Xinru Yan, Boxi Cao, Yaojie Lu, Hongyu Lin, Weixiang Zhou, Le Sun, Xianpei Han

PDF

1 Repo

TL;DR

This paper investigates the modality preference of Omni-modal Large Language Models, revealing a visual bias and providing insights into its emergence and implications for trustworthiness.

Contribution

It introduces a benchmark and metric for quantifying modality preference, and offers a mechanistic understanding and diagnostic tools for OLLMs.

Findings

01

Most OLLMs exhibit a pronounced visual preference, unlike traditional VLMs.

02

Modality preference emerges progressively in mid-to-late layers of the models.

03

Using internal signals, the method effectively diagnoses cross-modal hallucinations.

Abstract

Native Omni-modal Large Language Models (OLLMs) have shifted from pipeline architectures to unified representation spaces. However, this native integration gives rise to a critical yet underexplored phenomenon: modality preference. To bridge this gap, we first systematically quantify modality preference of OLLMs using a newly-curated conflict-based benchmark and the modality selection rate metric. Our evaluation of ten representative OLLMs reveals a notable paradigm shift: unlike the ``text-dominance'' of traditional VLMs, most OLLMs exhibit a pronounced visual preference. To further understand the underlying mechanism, we conduct layer-wise probing and demonstrate that such modality preference is not static but emerges progressively in the mid-to-late layers. Building upon these insights, we leverage these internal signals to diagnose cross-modal hallucinations, achieving competitive…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

icip-cas/OmniPreference
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.