Pause or Fabricate? Training Language Models for Grounded Reasoning
Yiwen Qiu, Linjuan Wu, Yizhou Liu, Yuchen Yan, Jin Ma, Xu Tan, Yao Hu, Daoxin Zhang, Wenqi Zhang, Weiming Lu, Jun Xiao, Yongliang Shen

TL;DR
This paper introduces GRIL, a reinforcement learning framework that improves grounded reasoning in language models by enabling them to recognize information gaps and pause for clarification, reducing hallucinations.
Contribution
The paper proposes a novel multi-turn RL approach with stage-specific rewards to enhance premise detection and prevent ungrounded reasoning in language models.
Findings
GRIL improves premise detection accuracy by up to 45%.
Task success increases by 30% with GRIL.
Average response length decreases by over 20% with the proposed method.
Abstract
Large language models have achieved remarkable progress on complex reasoning tasks. However, they often implicitly fabricate information when inputs are incomplete, producing confident but unreliable conclusions -- a failure mode we term ungrounded reasoning. We argue that this issue arises not from insufficient reasoning capability, but from the lack of inferential boundary awareness -- the ability to recognize when the necessary premises for valid inference are missing. To address this issue, we propose Grounded Reasoning via Interactive Reinforcement Learning (GRIL), a multi-turn reinforcement learning framework for grounded reasoning under incomplete information. GRIL decomposes the reasoning process into two stages: clarify and pause, which identifies whether the available information is sufficient, and grounded reasoning, which performs task solving once the necessary premises are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
