The Illusion-Illusion: Vision Language Models See Illusions Where There are None
Tomer Ullman

TL;DR
This paper investigates whether vision language models are susceptible to illusions by presenting them with benign, non-illusionary stimuli that they misinterpret as illusions, revealing fundamental processing errors.
Contribution
It introduces the concept of illusion-illusions to test vision language models and demonstrates their tendency to falsely perceive non-illusions as illusions, highlighting core model limitations.
Findings
Models often mistake benign stimuli for illusions.
Vision language models exhibit fundamental perception errors.
These errors reflect broader issues in current perception modeling.
Abstract
Illusions are entertaining, but they are also a useful diagnostic tool in cognitive science, philosophy, and neuroscience. A typical illusion shows a gap between how something "really is" and how something "appears to be", and this gap helps us understand the mental processing that lead to how something appears to be. Illusions are also useful for investigating artificial systems, and much research has examined whether computational models of perceptions fall prey to the same illusions as people. Here, I invert the standard use of perceptual illusions to examine basic processing errors in current vision language models. I present these models with illusory-illusions, neighbors of common illusions that should not elicit processing errors. These include such things as perfectly reasonable ducks, crooked lines that truly are crooked, circles that seem to have different sizes because they…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAfrican history and culture analysis
