Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints
Zhenyun Yin, Shujie Wang, Xuhong Wang, Xingjun Ma, Yinchun Wang

TL;DR
This paper introduces Deliberative Searcher, a reinforcement learning framework that enhances LLM reliability by integrating certainty calibration with retrieval-based verification, resulting in more trustworthy answers.
Contribution
It is the first to combine certainty calibration with retrieval-based search in a reinforcement learning framework for open-domain QA.
Findings
Improves alignment between model confidence and correctness.
Enhances trustworthiness of LLM outputs.
Utilizes multi-step reflection and verification over Wikipedia.
Abstract
Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step reflection and verification over Wikipedia data and is trained with a reinforcement learning algorithm that optimizes for accuracy under a soft reliability constraint. Empirical results show that proposed method improves alignment between model confidence and correctness, leading to more trustworthy outputs. This paper will be continuously updated.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
