I Spy With My Model's Eye: Visual Search as a Behavioural Test for MLLMs

John Burden; Jonathan Prunty; Ben Slater; Matthieu Tehenan; Greg Davis; Lucy Cheke

arXiv:2510.19678·cs.CV·October 23, 2025

I Spy With My Model's Eye: Visual Search as a Behavioural Test for MLLMs

John Burden, Jonathan Prunty, Ben Slater, Matthieu Tehenan, Greg Davis, Lucy Cheke

PDF

Open Access

TL;DR

This paper adapts visual search paradigms from cognitive psychology to evaluate the perceptual mechanisms of multimodal large language models, revealing human-like pop-out effects and scene priors in these models.

Contribution

It introduces a novel, cognitively grounded evaluation method for MLLMs using visual search tests, uncovering their perceptual similarities to humans.

Findings

01

MLLMs exhibit pop-out effects in color and size-based searches.

02

MLLMs show capacity limits in conjunctive (multi-feature) searches.

03

Evidence suggests MLLMs incorporate scene priors like lighting direction.

Abstract

Multimodal large language models (MLLMs) achieve strong performance on vision-language tasks, yet their visual processing is opaque. Most black-box evaluations measure task accuracy, but reveal little about underlying mechanisms. Drawing on cognitive psychology, we adapt classic visual search paradigms -- originally developed to study human perception -- to test whether MLLMs exhibit the ``pop-out'' effect, where salient visual features are detected independently of distractor set size. Using controlled experiments targeting colour, size and lighting features, we find that advanced MLLMs exhibit human-like pop-out effects in colour or size-based disjunctive (single feature) search, as well as capacity limits for conjunctive (multiple feature) search. We also find evidence to suggest that MLLMs, like humans, incorporate natural scene priors such as lighting direction into object…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Neurobiology of Language and Bilingualism · Categorization, perception, and language