TL;DR
This paper introduces a method for large language models to explicitly enumerate multiple interpretations of ambiguous requests, improving transparency and coverage without needing clarification questions.
Contribution
The authors propose a reinforcement learning approach that generates structured responses listing possible interpretations with corresponding answers, enhancing interpretability and safety.
Findings
Higher coverage of valid answers compared to baselines
Human evaluation confirms meaningful and explanatory interpretations
Efficient one-step generation supports downstream applications
Abstract
Large language models often respond to ambiguous requests by implicitly committing to one interpretation, frustrating users and creating safety risks when that interpretation is wrong. We propose generating a single structured response that enumerates the different ways an ambiguous request can be interpreted, each coupled with a corresponding answer. Our models are trained with reinforcement learning using a dual reward objective: recall on ambiguous inputs to maximise coverage of valid interpretations, and precision on unambiguous ones to suppress spurious alternatives. Training requires only multiple valid answers per input as supervision, no clarification questions or explicit interpretations are needed. Experiments on conversational question answering and semantic parsing demonstrate that our method achieves higher coverage of valid answers than baseline approaches. Human…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
