Exploring Spatial Schema Intuitions in Large Language and Vision Models
Philipp Wicke, Lennart Wachowiak

TL;DR
This paper investigates whether large language and vision models implicitly understand human spatial intuitions by reproducing psycholinguistic experiments, revealing surprising correlations despite their non-embodied nature.
Contribution
It demonstrates that LLMs can capture aspects of human spatial reasoning, providing new insights into their capabilities beyond language processing.
Findings
Correlations found between model outputs and human responses.
Vision-language models show reduced correlation with human spatial intuitions.
Models exhibit polarized responses, indicating varied understanding.
Abstract
Despite the ubiquity of large language models (LLMs) in AI research, the question of embodiment in LLMs remains underexplored, distinguishing them from embodied systems in robotics where sensory perception directly informs physical action. Our investigation navigates the intriguing terrain of whether LLMs, despite their non-embodied nature, effectively capture implicit human intuitions about fundamental, spatial building blocks of language. We employ insights from spatial cognitive foundations developed through early sensorimotor experiences, guiding our exploration through the reproduction of three psycholinguistic experiments. Surprisingly, correlations between model outputs and human responses emerge, revealing adaptability without a tangible connection to embodied experiences. Notable distinctions include polarized language model responses and reduced correlations in vision language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsGeographic Information Systems Studies · Speech and dialogue systems · Constraint Satisfaction and Optimization
