Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity

Samhita Bollepally; Aurora Sloman-Moll; Takashi Yamauchi

arXiv:2601.09041·cs.CL·January 15, 2026

Can LLMs interpret figurative language as humans do?: surface-level vs representational similarity

Samhita Bollepally, Aurora Sloman-Moll, Takashi Yamauchi

PDF

Open Access

TL;DR

This study compares how humans and instruction-tuned large language models interpret figurative and socially grounded language, revealing surface-level alignment but significant divergence in deeper, representational understanding, especially for idioms and slang.

Contribution

It provides a detailed comparison of human and LLM interpretive patterns across various linguistic traits, highlighting the models' limitations in understanding figurative language.

Findings

01

Humans and LLMs align at surface level but diverge in representational interpretation.

02

GPT-4 most closely mimics human interpretive patterns.

03

All models struggle with context-dependent expressions like sarcasm and slang.

Abstract

Large language models generate judgments that resemble those of humans. Yet the extent to which these models align with human judgments in interpreting figurative and socially grounded language remains uncertain. To investigate this, human participants and four instruction-tuned LLMs of different sizes (GPT-4, Gemma-2-9B, Llama-3.2, and Mistral-7B) rated 240 dialogue-based sentences representing six linguistic traits: conventionality, sarcasm, funny, emotional, idiomacy, and slang. Each of the 240 sentences was paired with 40 interpretive questions, and both humans and LLMs rated these sentences on a 10-point Likert scale. Results indicated that humans and LLMs aligned at the surface level with humans, but diverged significantly at the representational level, especially in interpreting figurative sentences involving idioms and Gen Z slang. GPT-4 most closely approximates human…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Neurobiology of Language and Bilingualism · Action Observation and Synchronization