Comparing perceptual judgments in large multimodal models and humans

Billy Dickson; Sahaj Singh Maini; Craig Sanders; Robert Nosofsky; Zoran Tiganj

PMC · DOI:10.3758/s13428-025-02728-w·June 19, 2025

Comparing perceptual judgments in large multimodal models and humans

Billy Dickson, Sahaj Singh Maini, Craig Sanders, Robert Nosofsky, Zoran Tiganj

PDF

Open Access

TL;DR

This paper compares how well large multimodal models like GPT-4o and humans judge perceptual features of rock images, finding that the model aligns well with humans on basic features but less so on abstract ones.

Contribution

The study introduces a benchmark for evaluating LMMs using human perceptual judgment data from cognitive science.

Findings

01

GPT-4o showed strong correlation with human ratings for basic perceptual dimensions like lightness and texture.

02

The model's alignment with humans was weaker for abstract rock-specific features like organization and pegmatitic structure.

03

LMMs like GPT-4o are approaching the level of human consensus on perceptual features of rock images.

Abstract

Cognitive scientists commonly collect participants' judgments regarding perceptual characteristics of stimuli to develop and evaluate models of attention, memory, learning, and decision-making. For instance, to model human responses in tasks of category learning and item recognition, researchers often collect perceptual judgments of images in order to embed the images in multidimensional feature spaces. This process is time-consuming and costly. Recent advancements in large multimodal models (LMMs) provide a potential alternative because such models can respond to prompts that include both text and images and could potentially replace human participants. To test whether the available LMMs can indeed be useful for this purpose, we evaluated their judgments on a dataset consisting of rock images that has been widely used by cognitive scientists. The dataset includes human perceptual…

Linked entities

Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.

Species1

Homo sapiens(human · species)

Figures5

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Child and Animal Learning Development