CaGR-RAG: Context-aware Query Grouping for Disk-based Vector Search in RAG Systems
Yeonwoo Jeong, Kyuli Park, Hyunji Cho, Sungyong Park

TL;DR
CaGR-RAG is a novel query grouping method for disk-based vector search in RAG systems that improves cache efficiency and reduces latency by organizing queries based on shared cluster access patterns.
Contribution
It introduces a context-aware query grouping and prefetching mechanism that optimizes disk access patterns in vector search systems.
Findings
Reduces 99th percentile tail latency by up to 51.55%.
Maintains higher cache hit ratio than baseline methods.
Improves overall retrieval performance in RAG systems.
Abstract
Modern embedding models capture both semantic and syntactic structures of queries, often mapping different queries to similar regions in vector space. This results in non-uniform cluster access patterns in disk-based vector search systems, particularly in Retrieval Augmented Generation (RAG) framework. While existing approaches optimize individual queries, they overlook the impact of cluster access patterns, failing to account for the locality effects of queries that access similar clusters. This oversight reduces cache efficiency and increases search latency due to excessive disk I/O. To address this, we introduce CaGR-RAG, a context-aware query grouping mechanism that organizes queries based on shared cluster access patterns. Additionally, it incorporates opportunistic cluster prefetching to minimize cache misses during transitions between query groups, further optimizing retrieval…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsData Management and Algorithms · Advanced Database Systems and Queries · Time Series Analysis and Forecasting
