Teaching Language Models To Gather Information Proactively
Tenghao Huang, Sihao Chen, Muhao Chen, Jonathan May, Longqi Yang, Mengting Wan, Pei Zhou

TL;DR
This paper introduces a new task paradigm for large language models to proactively gather missing information through strategic questioning, enhancing their collaborative problem-solving capabilities in ambiguous real-world scenarios.
Contribution
The paper proposes a scalable framework and reinforcement finetuning method enabling LLMs to identify information gaps and elicit implicit user knowledge effectively.
Findings
Qwen-2.5-7B outperforms o3-mini by 18% on automatic metrics.
Human evaluators prefer the model's questions and outlines by 42% and 28%.
Proactive clarification improves LLM collaboration quality.
Abstract
Large language models (LLMs) are increasingly expected to function as collaborative partners, engaging in back-and-forth dialogue to solve complex, ambiguous problems. However, current LLMs often falter in real-world settings, defaulting to passive responses or narrow clarifications when faced with incomplete or under-specified prompts, falling short of proactively gathering the missing information that is crucial for high-quality solutions. In this work, we introduce a new task paradigm: proactive information gathering, where LLMs must identify gaps in the provided context and strategically elicit implicit user knowledge through targeted questions. To systematically study and train this capability, we design a scalable framework that generates partially specified, real-world tasks, masking key information and simulating authentic ambiguity. Within this setup, our core innovation is a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
