Tug-of-war between idioms' figurative and literal interpretations in LLMs
Soyoung Oh, Xinting Huang, Mathis Pink, Michael Hahn, Vera Demberg

TL;DR
This paper investigates how pretrained causal transformers process idioms with figurative and literal meanings, revealing mechanisms that enable the models to disambiguate and maintain both interpretations through causal tracing.
Contribution
It introduces a systematic causal tracing approach to uncover how language models handle idiomatic ambiguity and disambiguation mechanisms.
Findings
Models retrieve figurative interpretations in early layers.
Context influences disambiguation from the earliest layers.
Parallel pathways carry both literal and figurative interpretations.
Abstract
Idioms present a unique challenge for language models due to their non-compositional figurative interpretations, which often strongly diverge from the idiom's literal interpretation. In this paper, we employ causal tracing to systematically analyze how pretrained causal transformers deal with this ambiguity. We localize three mechanisms: (i) Early sublayers and specific attention heads retrieve an idiom's figurative interpretation, while suppressing its literal interpretation. (ii) When disambiguating context precedes the idiom, the model leverages it from the earliest layer and later layers refine the interpretation if the context conflicts with the retrieved interpretation. (iii) Then, selective, competing pathways carry both interpretations: an intermediate pathway prioritizes the figurative interpretation and a parallel direct route favors the literal interpretation, ensuring that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsLanguage, Metaphor, and Cognition · Topic Modeling · Natural Language Processing Techniques
