When Contextual Inference Fails: Cancelability in Interactive Instruction Following

Natalia Bila; Kata Nasz\'adi; Alexandra Mayn; Christof Monz

arXiv:2603.19997·cs.CL·March 23, 2026

When Contextual Inference Fails: Cancelability in Interactive Instruction Following

Natalia Bila, Kata Nasz\'adi, Alexandra Mayn, Christof Monz

PDF

Open Access

TL;DR

This paper examines how large language models handle contextual inference and clarifications in a collaborative task, revealing they recognize unreliability but often fail to act optimally in ambiguous situations.

Contribution

It introduces BWIM, a new benchmark for testing models' ability to resolve ambiguity or request clarification in interactive tasks.

Findings

01

Models detect speaker unreliability in confidence ratings.

02

Models often fail to use unreliability to guide clarification.

03

Models exhibit suboptimal clarification strategies.

Abstract

We investigate the separation of literal interpretation from contextual inference in a collaborative block-building task where a builder must resolve underspecified instructions using contextual inferences. Building on an existing two-speaker psycholinguistic paradigm -- which contrasts a pragmatically cooperative speaker with one who is only literally reliable -- we introduce Build What I Mean (BWIM), an interactive benchmark for contextual meaning construction. In BWIM, models must resolve ambiguity by either performing a contextual inference or requesting clarification at a small communication cost. Evaluating several state-of-the-art LLMs, we find a dissociation between judgment and action: while models detect speaker unreliability in explicit confidence ratings, they fail to exploit this information to guide efficient clarification behavior. Instead, we observe suboptimal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Topic Modeling · Neurobiology of Language and Bilingualism