Over-Searching in Search-Augmented Large Language Models

Roy Xie; Deepak Gopinath; David Qiu; Dong Lin; Haitian Sun; Saloni Potdar; Bhuwan Dhingra

arXiv:2601.05503·cs.LG·March 12, 2026

Over-Searching in Search-Augmented Large Language Models

Roy Xie, Deepak Gopinath, David Qiu, Dong Lin, Haitian Sun, Saloni Potdar, Bhuwan Dhingra

PDF

Open Access

TL;DR

This paper systematically evaluates over-searching in search-augmented large language models, revealing its impact on accuracy, efficiency, and hallucinations, and proposes metrics and mitigation strategies to improve their performance.

Contribution

It introduces a comprehensive analysis of over-searching effects, proposes the Tokens Per Correctness metric, and releases OverSearchQA for future research.

Findings

01

Search improves answer accuracy on answerable queries but harms unanswerable ones.

02

Over-searching is more severe in complex models and with noisy retrieval.

03

Negative evidence in retrieved data can help improve abstention decisions.

Abstract

Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they often over-search -- unnecessarily invoking search tool even when it does not improve response quality, which leads to computational inefficiency and hallucinations by incorporating irrelevant context. In this work, we conduct a systematic evaluation of over-searching across multiple dimensions, including query types, model categories, retrieval conditions, and multi-turn conversations. Our finding shows: (i) search generally improves answer accuracy on answerable queries but harms abstention on unanswerable ones; (ii) over-searching is more pronounced in complex reasoning models and deep research systems, is exacerbated by noisy retrieval, and compounds across turns in multi-turn conversations; and (iii) the composition of retrieved evidence is crucial, as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems