Do VLMs Perceive or Recall? Probing Visual Perception vs. Memory with Classic Visual Illusions

Xiaoxiao Sun; Mingyang Li; Kun Yuan; Min Woo Sun; Mark Endo; Shengguang Wu; Changlin Li; Yuhui Zhang; Zeyu Wang; Serena Yeung-Levy

arXiv:2601.22150·cs.CV·April 1, 2026

Do VLMs Perceive or Recall? Probing Visual Perception vs. Memory with Classic Visual Illusions

Xiaoxiao Sun, Mingyang Li, Kun Yuan, Min Woo Sun, Mark Endo, Shengguang Wu, Changlin Li, Yuhui Zhang, Zeyu Wang, Serena Yeung-Levy

PDF

1 Repo

TL;DR

This paper introduces VI-Probe, a framework to distinguish whether large vision-language models perceive visual changes or rely on memorized patterns, revealing diverse underlying mechanisms across models.

Contribution

The study presents a systematic probing framework with controlled visual illusions to analyze perception versus recall in VLMs, moving beyond average accuracy measures.

Findings

01

GPT-5 shows memory override behavior.

02

Claude-Opus-4.1 exhibits perception-memory competition.

03

Qwen variants indicate visual-processing limits.

Abstract

Large Vision-Language Models (VLMs) often answer classic visual illusions "correctly" on original images, yet persist with the same responses when illusion factors are inverted, even though the visual change is obvious to humans. This raises a fundamental question: do VLMs perceive visual changes or merely recall memorized patterns? While several studies have noted this phenomenon, the underlying causes remain unclear. To move from observations to systematic understanding, this paper introduces VI-Probe, a controllable visual-illusion framework with graded perturbations and matched visual controls (without illusion inducer) that disentangles visually grounded perception from language-driven recall. Unlike prior work that focuses on averaged accuracy, we measure stability and sensitivity using Polarity-Flip Consistency, Template Fixation Index, and an illusion multiplier normalized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://sites.google.com/view/vi-probe
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.