Reasoning or Pattern Matching? Probing Large Vision-Language Models with Visual Puzzles

Maria Lymperaiou; Vasileios Karampinis; Giorgos Filandrianos; Angelos Vlachos; Chrysoula Zerva; Athanasios Voulodimos

arXiv:2601.13705·cs.CV·January 21, 2026

Reasoning or Pattern Matching? Probing Large Vision-Language Models with Visual Puzzles

Maria Lymperaiou, Vasileios Karampinis, Giorgos Filandrianos, Angelos Vlachos, Chrysoula Zerva, Athanasios Voulodimos

PDF

Open Access

TL;DR

This paper reviews how visual puzzles serve as diagnostic tools for assessing reasoning in large vision-language models, highlighting current limitations and proposing future directions for more reasoning-aware systems.

Contribution

It provides a unified framework for understanding visual puzzle reasoning in LVLMs and links puzzle design to cognitive reasoning mechanisms, identifying key limitations in current models.

Findings

01

Current models show brittle generalization.

02

Perception and reasoning are tightly entangled.

03

There is a gap between explanations and faithful execution.

Abstract

Puzzles have long served as compact and revealing probes of human cognition, isolating abstraction, rule discovery, and systematic reasoning with minimal reliance on prior knowledge. Leveraging these properties, visual puzzles have recently emerged as a powerful diagnostic tool for evaluating the reasoning abilities of Large Vision-Language Models (LVLMs), offering controlled, verifiable alternatives to open-ended multimodal benchmarks. This survey provides a unified perspective of visual puzzle reasoning in LVLMs. We frame visual puzzles through a common abstraction and organize existing benchmarks by the reasoning mechanisms they target (inductive, analogical, algorithmic, deductive, and geometric/spatial), thereby linking puzzle design to the cognitive operations required for solving. Synthesizing empirical evidence across these categories, we identify consistent limitations in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Neurobiology of Language and Bilingualism · Language, Metaphor, and Cognition