Do LLMs Capture Embodied Cognition and Cultural Variation? Cross-Linguistic Evidence from Demonstratives
Yu Wang, Emmanuele Chersoni, Chu-Ren Huang

TL;DR
This study investigates whether large language models understand embodied cognition and cultural differences by analyzing their responses to demonstratives across languages, revealing significant gaps compared to human performance.
Contribution
Introduces demonstratives as a novel probe for evaluating embodied cognition and cultural variation in LLMs, highlighting their limitations in cross-cultural understanding.
Findings
Humans distinguish proximal-distal referents reliably.
Chinese speakers switch perspectives fluently.
LLMs fail to grasp cultural and perspective differences.
Abstract
Do large language models (LLMs) truly acquire embodied cognition and cultural conventions from text? We introduce demonstratives, fundamental spatial expressions like "this/that" in English and "zh\`e/n\`a" in Chinese, as a novel probe for grounded knowledge. Using 6,400 responses from 320 native speakers, we establish a human baseline: English speakers reliably distinguish proximal-distal referents but struggle with perspective-taking, while Chinese speakers switch perspectives fluently but tolerate distal ambiguity. In contrast, five state-of-the-art LLMs fail to inherently understand the proximal-distal contrast and show no cultural differences, defaulting to English-centric reasoning. Our study contributes (i) a new task, based on demonstratives, as a new lens for evaluating embodied cognition and cultural conventions; (ii) empirical evidence of cross-cultural asymmetries in human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
