Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty

Peilin Wu; Mian Zhang; Xinlu Zhang; Xinya Du; Zhiyu Zoey Chen

arXiv:2505.17281·cs.CL·October 10, 2025

Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty

Peilin Wu, Mian Zhang, Xinlu Zhang, Xinya Du, Zhiyu Zoey Chen

PDF

1 Video

TL;DR

This paper identifies and mitigates sub-optimal search behaviors in agentic RAG systems by linking search efficiency to model uncertainty, and proposes a reinforcement learning method to improve search decision quality, leading to better performance.

Contribution

It formally defines and quantifies over-search and under-search behaviors, and introduces $eta$-GRPO, a reinforcement learning approach that reduces uncertainty-driven inefficiencies in agentic RAG systems.

Findings

01

$eta$-GRPO improves search decision accuracy.

02

Models with $eta$-GRPO outperform baselines by 4% in exact match.

03

Sub-optimal search behaviors are linked to model uncertainty.

Abstract

Agentic Retrieval-Augmented Generation (RAG) systems enhance Large Language Models (LLMs) by enabling dynamic, multi-step reasoning and information retrieval. However, these systems often exhibit sub-optimal search behaviors like over-search (retrieving redundant information) and under-search (failing to retrieve necessary information), which hinder efficiency and reliability. This work formally defines and quantifies these behaviors, revealing their prevalence across multiple QA datasets and agentic RAG systems (e.g., one model could have avoided searching in 27.7% of its search steps). Furthermore, we demonstrate a crucial link between these inefficiencies and the models' uncertainty regarding their own knowledge boundaries, where response accuracy correlates with model's uncertainty in its search decisions. To address this, we propose $β$ -GRPO, a reinforcement learning-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty· underline

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Layer Normalization · Linear Warmup With Linear Decay · Attention Dropout · Byte Pair Encoding · Softmax · Linear Layer · Dropout · Dense Connections · Attention Is All You Need