Beneath the Surface: Investigating LLMs' Capabilities for Communicating with Subtext
Kabir Ahuja, Yuxuan Li, Andrew Kyle Lampinen

TL;DR
This paper evaluates whether large language models can understand and use subtext in communication, revealing their limitations and potential in nuanced, creative interactions.
Contribution
The study introduces four new evaluation suites to systematically assess LLMs' capabilities in understanding and communicating subtext, highlighting current weaknesses.
Findings
Models tend to communicate too literally, with 60% literal clues in Visual Allusions.
Some models can reduce literal clues by 30-50% using common ground.
Models struggle to infer unspoken common ground when not explicitly provided.
Abstract
Human communication is fundamentally creative, and often makes use of subtext -- implied meaning that goes beyond the literal content of the text. Here, we systematically study whether language models can use subtext in communicative settings, and introduce four new evaluation suites to assess these capabilities. Our evaluation settings range from writing & interpreting allegories to playing multi-agent and multi-modal games inspired by the rules of board games like Dixit. We find that frontier models generally exhibit a strong bias towards overly literal, explicit communication, and thereby fail to account for nuanced constraints -- even the best performing models generate literal clues 60% of times in one of our environments -- Visual Allusions. However, we find that some models can sometimes make use of common ground with another party to help them communicate with subtext, achieving…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
