Beyond Semantic Similarity: Rethinking Retrieval for Agentic Search via Direct Corpus Interaction
Zhuofeng Li, Haoxiang Zhang, Cong Wei, Pan Lu, Ping Nie, Yi Lu, Yuyang Bai, Shangbin Feng, Hangxiao Zhu, Ming Zhong, Yuyu Zhang, Jianwen Xie, Yejin Choi, James Zou, Jiawei Han, Wenhu Chen, Jimmy Lin, Dongfu Jiang, and Yu Zhang

TL;DR
This paper proposes direct corpus interaction (DCI) for agentic search, enabling agents to search raw corpora with simple tools, outperforming traditional retrieval methods on various benchmarks without relying on embeddings or indices.
Contribution
The study introduces DCI as a novel retrieval approach that bypasses traditional indexing, improving agentic search performance and flexibility across multiple datasets.
Findings
DCI outperforms strong sparse, dense, and reranking baselines on several datasets.
DCI achieves high accuracy on BrowseComp-Plus and multi-hop QA tasks.
DCI does not rely on any conventional semantic retriever or offline indexing.
Abstract
Modern retrieval systems, whether lexical or semantic, expose a corpus through a fixed similarity interface that compresses access into a single top-k retrieval step before reasoning. This abstraction is efficient, but for agentic search, it becomes a bottleneck: exact lexical constraints, sparse clue conjunctions, local context checks, and multi-step hypothesis refinement are difficult to implement by calling a conventional off-the-shelf retriever, and evidence filtered out early cannot be recovered by stronger downstream reasoning. Agentic tasks further exacerbate this limitation because they require agents to orchestrate multiple steps, including discovering intermediate entities, combining weak clues, and revising the plan after observing partial evidence. To tackle the limitation, we study direct corpus interaction (DCI), where an agent searches the raw corpus directly with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
