Learning Steerable Clarification Policies with Collaborative Self-play

Jonathan Berant; Maximillian Chen; Adam Fisch; Reza Aghajani; Fantine Huot; Mirella Lapata; Jacob Eisenstein

arXiv:2512.04068·cs.LG·January 14, 2026

Learning Steerable Clarification Policies with Collaborative Self-play

Jonathan Berant, Maximillian Chen, Adam Fisch, Reza Aghajani, Fantine Huot, Mirella Lapata, Jacob Eisenstein

PDF

Open Access

TL;DR

This paper introduces a method for training AI assistant policies to manage ambiguous queries using self-play and reinforcement learning, enabling context-dependent and cost-sensitive clarification strategies.

Contribution

It presents a novel self-play training approach for steerable clarification policies that adapt based on cost inputs, improving response accuracy and flexibility.

Findings

01

Steerable policies improve accuracy in ambiguous query handling.

02

The approach generalizes to unseen cost values.

03

Reinforced Self-Training enhances policy performance.

Abstract

To handle underspecified or ambiguous queries, AI assistants need a policy for managing their uncertainty to determine (a) when to guess the user intent and answer directly, (b) when to enumerate and answer multiple possible intents, and (c) when to ask a clarifying question. However, such policies are contextually dependent on factors such as user preferences or modality. For example, enumerating multiple possible user intentions is cumbersome on small screens or in a voice setting. In this work, we propose to train steerable policies for managing this uncertainty using self-play. Given two agents, one simulating a user and the other an AI assistant, we generate conversations where the user issues a potentially ambiguous query, and the assistant needs to determine how to respond. Importantly, the model takes as input the numerical cost of each clarification question, and each generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Topic Modeling · Multimodal Machine Learning Applications