Grounded Concreteness: Human-Like Concreteness Sensitivity in Vision-Language Models
Aryan Roy, Zekun Wang, Christopher J. MacLellan

TL;DR
This study investigates whether vision-language models develop human-like sensitivity to linguistic concreteness more than text-only models, revealing that VLMs show stronger concreteness effects, more structured representations, and better alignment with human judgments.
Contribution
It provides a comprehensive comparison demonstrating that multimodal pretraining enhances models' concreteness sensitivity and human-like judgment alignment across multiple evaluation levels.
Findings
VLMs perform better on concrete inputs in QA accuracy.
Representations in VLMs are organized along a concreteness axis.
VLMs produce concreteness ratings more aligned with human norms.
Abstract
Do vision--language models (VLMs) develop more human-like sensitivity to linguistic concreteness than text-only large language models (LLMs) when both are evaluated with text-only prompts? We study this question with a controlled comparison between matched Llama text backbones and their Llama Vision counterparts across multiple model scales, treating multimodal pretraining as an ablation on perceptual grounding rather than access to images at inference. We measure concreteness effects at three complementary levels: (i) output behavior, by relating question-level concreteness to QA accuracy; (ii) embedding geometry, by testing whether representations organize along a concreteness axis; and (iii) attention dynamics, by quantifying context reliance via attention-entropy measures. In addition, we elicit token-level concreteness ratings from models and evaluate alignment to human norm…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Neurobiology of Language and Bilingualism · Language, Metaphor, and Cognition
