CaGR-RAG: Context-aware Query Grouping for Disk-based Vector Search in   RAG Systems

Yeonwoo Jeong; Kyuli Park; Hyunji Cho; Sungyong Park

arXiv:2505.01164·cs.DC·May 5, 2025

CaGR-RAG: Context-aware Query Grouping for Disk-based Vector Search in RAG Systems

Yeonwoo Jeong, Kyuli Park, Hyunji Cho, Sungyong Park

PDF

Open Access

TL;DR

CaGR-RAG is a novel query grouping method for disk-based vector search in RAG systems that improves cache efficiency and reduces latency by organizing queries based on shared cluster access patterns.

Contribution

It introduces a context-aware query grouping and prefetching mechanism that optimizes disk access patterns in vector search systems.

Findings

01

Reduces 99th percentile tail latency by up to 51.55%.

02

Maintains higher cache hit ratio than baseline methods.

03

Improves overall retrieval performance in RAG systems.

Abstract

Modern embedding models capture both semantic and syntactic structures of queries, often mapping different queries to similar regions in vector space. This results in non-uniform cluster access patterns in disk-based vector search systems, particularly in Retrieval Augmented Generation (RAG) framework. While existing approaches optimize individual queries, they overlook the impact of cluster access patterns, failing to account for the locality effects of queries that access similar clusters. This oversight reduces cache efficiency and increases search latency due to excessive disk I/O. To address this, we introduce CaGR-RAG, a context-aware query grouping mechanism that organizes queries based on shared cluster access patterns. Additionally, it incorporates opportunistic cluster prefetching to minimize cache misses during transitions between query groups, further optimizing retrieval…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Management and Algorithms · Advanced Database Systems and Queries · Time Series Analysis and Forecasting