Cluster-based Adaptive Retrieval: Dynamic Context Selection for RAG Applications
Yifan Xu, Vipul Gupta, Rohit Aggarwal, Varsha Mahadevan, Bhaskar Krishnamachari

TL;DR
This paper introduces Cluster-based Adaptive Retrieval (CAR), a dynamic method for selecting the number of documents in RAG systems, improving relevance, efficiency, and user engagement over static approaches.
Contribution
CAR adaptively determines the optimal retrieval depth by analyzing clustering patterns in similarity scores, outperforming fixed top-k methods in multiple benchmarks.
Findings
CAR achieves highest TES scores on benchmarks.
Reduces LLM token usage by 60%.
Cuts latency by 22% and hallucinations by 10%.
Abstract
Retrieval-Augmented Generation (RAG) enhances large language models (LLMs) by pulling in external material, document, code, manuals, from vast and ever-growing corpora, to effectively answer user queries. The effectiveness of RAG depends significantly on aligning the number of retrieved documents with query characteristics: narrowly focused queries typically require fewer, highly relevant documents, whereas broader or ambiguous queries benefit from retrieving more extensive supporting information. However, the common static top-k retrieval approach fails to adapt to this variability, resulting in either insufficient context from too few documents or redundant information from too many. Motivated by these challenges, we introduce Cluster-based Adaptive Retrieval (CAR), an algorithm that dynamically determines the optimal number of documents by analyzing the clustering patterns of ordered…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Natural Language Processing Techniques
