Vision-language models learn the geometry of human perceptual space

Craig Sanders; Billy Dickson; Sahaj Singh Maini; Robert Nosofsky; Zoran Tiganj

arXiv:2510.20859·q-bio.NC·October 27, 2025

Vision-language models learn the geometry of human perceptual space

Craig Sanders, Billy Dickson, Sahaj Singh Maini, Robert Nosofsky, Zoran Tiganj

PDF

TL;DR

This paper demonstrates that vision-language models learn a perceptual space similar to human cognition, and this geometry can predict human categorization better than direct human judgments, bridging AI and cognitive science.

Contribution

Introduces a method to analyze the internal geometry of VLMs and shows they capture human-like perceptual structures relevant for categorization.

Findings

01

VLMs recover low-dimensional perceptual spaces aligned with human perception.

02

AI-derived spaces predict human categorization more accurately than human judgments.

03

Provides a scalable approach to study cognitive representations using AI models.

Abstract

In cognitive science and AI, a longstanding question is whether machines learn representations that align with those of the human mind. While current models show promise, it remains an open question whether this alignment is superficial or reflects a deeper correspondence in the underlying dimensions of representation. Here we introduce a methodology to probe the internal geometry of vision-language models (VLMs) by having them generate pairwise similarity judgments for a complex set of natural objects. Using multidimensional scaling, we recover low-dimensional psychological spaces and find that their axes show a strong correspondence with the principal axes of human perceptual space. Critically, when this AI-derived representational geometry is used as the input to a classic exemplar model of categorization, it predicts human classification behavior more accurately than a space…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.