Comparing perceptual judgments in large multimodal models and humans
Billy Dickson, Sahaj Singh Maini, Craig Sanders, Robert Nosofsky, Zoran Tiganj

TL;DR
This paper compares how well large multimodal models like GPT-4o and humans judge perceptual features of rock images, finding that the model aligns well with humans on basic features but less so on abstract ones.
Contribution
The study introduces a benchmark for evaluating LMMs using human perceptual judgment data from cognitive science.
Findings
GPT-4o showed strong correlation with human ratings for basic perceptual dimensions like lightness and texture.
The model's alignment with humans was weaker for abstract rock-specific features like organization and pegmatitic structure.
LMMs like GPT-4o are approaching the level of human consensus on perceptual features of rock images.
Abstract
Cognitive scientists commonly collect participants' judgments regarding perceptual characteristics of stimuli to develop and evaluate models of attention, memory, learning, and decision-making. For instance, to model human responses in tasks of category learning and item recognition, researchers often collect perceptual judgments of images in order to embed the images in multidimensional feature spaces. This process is time-consuming and costly. Recent advancements in large multimodal models (LMMs) provide a potential alternative because such models can respond to prompts that include both text and images and could potentially replace human participants. To test whether the available LMMs can indeed be useful for this purpose, we evaluated their judgments on a dataset consisting of rock images that has been widely used by cognitive scientists. The dataset includes human perceptual…
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Child and Animal Learning Development
