Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies

Wenjin Hou; Wei Liu; Han Hu; Xiaoxiao Sun; Serena Yeung-Levy; Hehe Fan

arXiv:2602.01816·cs.CV·February 3, 2026

Seeing Is Believing? A Benchmark for Multimodal Large Language Models on Visual Illusions and Anomalies

Wenjin Hou, Wei Liu, Han Hu, Xiaoxiao Sun, Serena Yeung-Levy, Hehe Fan

PDF

Open Access

TL;DR

This paper introduces VIA-Bench, a new benchmark for testing multimodal large language models on visual illusions and anomalies, revealing significant vulnerabilities and highlighting the gap between machine and human perception.

Contribution

The paper presents VIA-Bench, a comprehensive benchmark with over 1,000 questions to evaluate MLLMs on visual illusions, exposing their weaknesses and the limited robustness of Chain-of-Thought reasoning.

Findings

01

MLLMs show significant vulnerabilities on visual illusions.

02

Chain-of-Thought reasoning offers negligible robustness.

03

Models often fail under illusory stimuli, unlike humans.

Abstract

Multimodal Large Language Models (MLLMs) have shown remarkable proficiency on general-purpose vision-language benchmarks, reaching or even exceeding human-level performance. However, these evaluations typically rely on standard in-distribution data, leaving the robustness of MLLMs largely unexamined when faced with scenarios that defy common-sense priors. To address this gap, we introduce VIA-Bench, a challenging benchmark designed to probe model performance on visual illusions and anomalies. It includes six core categories: color illusions, motion illusions, gestalt illusions, geometric and spatial illusions, general visual illusions, and visual anomalies. Through careful human-in-the-loop review, we construct over 1K high-quality question-answer pairs that require nuanced visual reasoning. Extensive evaluation of over 20 state-of-the-art MLLMs, including proprietary, open-source, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Explainable Artificial Intelligence (XAI)