MIRAGE: The Illusion of Visual Understanding

Mohammad Asadi; Jack W. O'Sullivan; Fang Cao; Tahoura Nedaee; Kamyar Rajabalifardi; Fei-Fei Li; Ehsan Adeli; Euan Ashley

arXiv:2603.21687·cs.AI·April 3, 2026

MIRAGE: The Illusion of Visual Understanding

Mohammad Asadi, Jack W. O'Sullivan, Fang Cao, Tahoura Nedaee, Kamyar Rajabalifardi, Fei-Fei Li, Ehsan Adeli, Euan Ashley

PDF

TL;DR

This paper reveals that multimodal AI models can generate detailed reasoning and perform well without actual visual input, exposing vulnerabilities in current evaluation methods and proposing a new benchmark for fair assessment.

Contribution

The paper uncovers mirage reasoning in multimodal models, demonstrates their high performance without images, and introduces B-Clean for unbiased evaluation of visual-language understanding.

Findings

01

Models generate detailed image descriptions without images.

02

Models achieve high benchmark scores without visual input.

03

Explicit instructions to guess reduce model performance.

Abstract

Multimodal AI systems have achieved remarkable performance across a broad range of real-world tasks, yet the mechanisms underlying visual-language reasoning remain surprisingly poorly understood. We report three findings that challenge prevailing assumptions about how these systems process and integrate visual information. First, Frontier models readily generate detailed image descriptions and elaborate reasoning traces, including pathology-biased clinical findings, for images never provided; we term this phenomenon mirage reasoning. Second, without any image input, models also attain strikingly high scores across general and medical multimodal benchmarks, bringing into question their utility and design. In the most extreme case, our model achieved the top rank on a standard chest X-ray question-answering benchmark without access to any images. Third, when models were explicitly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.