Over-Searching in Search-Augmented Large Language Models
Roy Xie, Deepak Gopinath, David Qiu, Dong Lin, Haitian Sun, Saloni Potdar, Bhuwan Dhingra

TL;DR
This paper systematically evaluates over-searching in search-augmented large language models, revealing its impact on accuracy, efficiency, and hallucinations, and proposes metrics and mitigation strategies to improve their performance.
Contribution
It introduces a comprehensive analysis of over-searching effects, proposes the Tokens Per Correctness metric, and releases OverSearchQA for future research.
Findings
Search improves answer accuracy on answerable queries but harms unanswerable ones.
Over-searching is more severe in complex models and with noisy retrieval.
Negative evidence in retrieved data can help improve abstention decisions.
Abstract
Search-augmented large language models (LLMs) excel at knowledge-intensive tasks by integrating external retrieval. However, they often over-search -- unnecessarily invoking search tool even when it does not improve response quality, which leads to computational inefficiency and hallucinations by incorporating irrelevant context. In this work, we conduct a systematic evaluation of over-searching across multiple dimensions, including query types, model categories, retrieval conditions, and multi-turn conversations. Our finding shows: (i) search generally improves answer accuracy on answerable queries but harms abstention on unanswerable ones; (ii) over-searching is more pronounced in complex reasoning models and deep research systems, is exacerbated by noisy retrieval, and compounds across turns in multi-turn conversations; and (iii) the composition of retrieved evidence is crucial, as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Expert finding and Q&A systems
