Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity
Samhita Bollepally, Aurora Sloman-Moll, Takashi Yamauchi

TL;DR
This study compares how humans and instruction-tuned large language models interpret figurative and socially grounded language, revealing surface-level alignment but significant divergence in deeper, representational understanding, especially for idioms and slang.
Contribution
It provides a detailed comparison of human and LLM interpretive patterns across various linguistic traits, highlighting the models' limitations in understanding figurative language.
Findings
Humans and LLMs align at surface level but diverge in representational interpretation.
GPT-4 most closely mimics human interpretive patterns.
All models struggle with context-dependent expressions like sarcasm and slang.
Abstract
Large language models generate judgments that resemble those of humans. Yet the extent to which these models align with human judgments in interpreting figurative and socially grounded language remains uncertain. To investigate this, human participants and four instruction-tuned LLMs of different sizes (GPT-4, Gemma-2-9B, Llama-3.2, and Mistral-7B) rated 240 dialogue-based sentences representing six linguistic traits: conventionality, sarcasm, funny, emotional, idiomacy, and slang. Each of the 240 sentences was paired with 40 interpretive questions, and both humans and LLMs rated these sentences on a 10-point Likert scale. Results indicated that humans and LLMs aligned at the surface level with humans, but diverged significantly at the representational level, especially in interpreting figurative sentences involving idioms and Gen Z slang. GPT-4 most closely approximates human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Neurobiology of Language and Bilingualism · Action Observation and Synchronization
