Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

Zhenyun Yin; Shujie Wang; Xuhong Wang; Xingjun Ma; Yinchun Wang

arXiv:2507.16727·cs.AI·April 20, 2026

Deliberative Searcher: Improving LLM Reliability via Reinforcement Learning with constraints

Zhenyun Yin, Shujie Wang, Xuhong Wang, Xingjun Ma, Yinchun Wang

PDF

TL;DR

This paper introduces Deliberative Searcher, a reinforcement learning framework that enhances LLM reliability by integrating certainty calibration with retrieval-based verification, resulting in more trustworthy answers.

Contribution

It is the first to combine certainty calibration with retrieval-based search in a reinforcement learning framework for open-domain QA.

Findings

01

Improves alignment between model confidence and correctness.

02

Enhances trustworthiness of LLM outputs.

03

Utilizes multi-step reflection and verification over Wikipedia.

Abstract

Improving the reliability of large language models (LLMs) is critical for deploying them in real-world scenarios. In this paper, we propose \textbf{Deliberative Searcher}, the first framework to integrate certainty calibration with retrieval-based search for open-domain question answering. The agent performs multi-step reflection and verification over Wikipedia data and is trained with a reinforcement learning algorithm that optimizes for accuracy under a soft reliability constraint. Empirical results show that proposed method improves alignment between model confidence and correctness, leading to more trustworthy outputs. This paper will be continuously updated.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.