TL;DR
This paper introduces a training-free, illusion-aware framework for improving vision-language models' perception of visual illusions through image preprocessing, prompt engineering, and ensemble methods, achieving high accuracy in a challenge setting.
Contribution
The authors propose a novel, training-free approach combining image preprocessing, prompt engineering, and ensemble techniques to enhance illusion understanding in vision-language models.
Findings
Achieved 90.48% accuracy on the challenge test set.
Achieved 98.41% accuracy on a human-verified subset.
Secured 2nd place in the CVPR 2026 DataCV Challenge.
Abstract
Vision-Language Models (VLMs) exhibit systematic bias toward visual illusions, recalling memorized facts rather than perceiving actual visual differences. This paper presents a training-free framework for the 5th DataCV Challenge Task 1 at CVPR 2026, addressing this perception-versus-memory conflict through three complementary strategies:(1) illusion-aware image preprocessing that weakens illusion-inducing context via type-specific transformations (edge extraction, color isolation, morphological processing, and reference-line overlay), (2) anti-illusion prompt engineering guiding VLMs toward qualitative visual comparison, and (3) multi-vote ensemble that further improves robustness. Our method achieves 90.48% accuracy on the official 630-image test set using Claude (claude-opus-4-6) with 5-vote majority ensemble, and 98.41% on a human-verified subset. The approach requires no…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
