Illusion-Aware Visual Preprocessing and Anti-Illusion Prompting for Classic Illusion Understanding in Vision-Language Models

Junli Zha; Jiahui Wang; Xinkai Lu; Jinbo Wang

arXiv:2605.08841·cs.CV·May 12, 2026

Illusion-Aware Visual Preprocessing and Anti-Illusion Prompting for Classic Illusion Understanding in Vision-Language Models

Junli Zha, Jiahui Wang, Xinkai Lu, Jinbo Wang

PDF

1 Repo

TL;DR

This paper introduces a training-free, illusion-aware framework for improving vision-language models' perception of visual illusions through image preprocessing, prompt engineering, and ensemble methods, achieving high accuracy in a challenge setting.

Contribution

The authors propose a novel, training-free approach combining image preprocessing, prompt engineering, and ensemble techniques to enhance illusion understanding in vision-language models.

Findings

01

Achieved 90.48% accuracy on the challenge test set.

02

Achieved 98.41% accuracy on a human-verified subset.

03

Secured 2nd place in the CVPR 2026 DataCV Challenge.

Abstract

Vision-Language Models (VLMs) exhibit systematic bias toward visual illusions, recalling memorized facts rather than perceiving actual visual differences. This paper presents a training-free framework for the 5th DataCV Challenge Task 1 at CVPR 2026, addressing this perception-versus-memory conflict through three complementary strategies:(1) illusion-aware image preprocessing that weakens illusion-inducing context via type-specific transformations (edge extraction, color isolation, morphological processing, and reference-line overlay), (2) anti-illusion prompt engineering guiding VLMs toward qualitative visual comparison, and (3) multi-vote ensemble that further improves robustness. Our method achieves 90.48% accuracy on the official 630-image test set using Claude (claude-opus-4-6) with 5-vote majority ensemble, and 98.41% on a human-verified subset. The approach requires no…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jasminezz/sf-illusion-aware-vlm.git
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.