BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search

Shiyu Liu; Yongjing Yin; Jianhao Yan; Yunbo Tang; Qinggang Zhang; Bei Li; Xin Chen; Jingang Wang; Xunliang Cai; Jinsong Su

arXiv:2601.11037·cs.AI·April 22, 2026

BAPO: Boundary-Aware Policy Optimization for Reliable Agentic Search

Shiyu Liu, Yongjing Yin, Jianhao Yan, Yunbo Tang, Qinggang Zhang, Bei Li, Xin Chen, Jingang Wang, Xunliang Cai, Jinsong Su

PDF

1 Repo

TL;DR

BAPO is a reinforcement learning framework that improves the reliability of agentic search in LLMs by enabling agents to recognize their reasoning limits and admit 'I DON'T KNOW' when appropriate.

Contribution

It introduces a novel boundary-aware reward mechanism and an adaptive reward modulator to enhance agent reliability without sacrificing accuracy.

Findings

01

BAPO significantly increases the correct use of 'I DON'T KNOW' responses.

02

It improves the overall reliability of agentic search across four benchmarks.

03

The method maintains high accuracy while reducing unreliable answers.

Abstract

RL-based agentic search enables LLMs to solve complex questions via dynamic planning and external search. While this approach significantly enhances accuracy with agent policies optimized via large-scale reinforcement learning, we identify a critical gap in reliability: these agents fail to recognize their reasoning boundaries and rarely admit ``I DON'T KNOW'' (IDK) even when evidence is insufficient or reasoning reaches its limit. The lack of reliability often leads to plausible but unreliable answers, introducing significant risks in many real-world scenarios. To this end, we propose Boundary-Aware Policy Optimization (BAPO), a novel RL framework designed to cultivate reliable boundary awareness without compromising accuracy. BAPO introduces two key components: (i) a group-based boundary-aware reward that encourages an IDK response only when the reasoning reaches its limit, and (ii)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

liushiyu-0709/BAPO-Reliable-Search
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.