Cache Mechanism for Agent RAG Systems

Shuhang Lin; Zhencan Peng; Lingyao Li; Xiao Lin; Xi Zhu; and Yongfeng Zhang

arXiv:2511.02919·cs.CL·November 6, 2025

Cache Mechanism for Agent RAG Systems

Shuhang Lin, Zhencan Peng, Lingyao Li, Xiao Lin, Xi Zhu, and Yongfeng Zhang

PDF

Open Access

TL;DR

This paper introduces ARC, a caching framework for RAG systems that dynamically manages compact, high-relevance knowledge bases, significantly reducing storage and latency while improving answer accuracy in LLM agents.

Contribution

ARC is a novel, annotation-free cache management method that adapts to query patterns and embedding geometry, enhancing RAG system efficiency and effectiveness.

Findings

01

Reduces storage to 0.015% of original corpus

02

Increases has-answer rate up to 79.8%

03

Decreases retrieval latency by 80%

Abstract

Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance, agent-level cache management, particularly constructing, maintaining, and updating a compact, relevant corpus dynamically tailored to each agent's need, remains underexplored. Therefore, we introduce ARC (Agent RAG Cache Mechanism), a novel, annotation-free caching framework that dynamically manages small, high-value corpora for each agent. By synthesizing historical query distribution patterns with the intrinsic geometry of cached items in the embedding space, ARC automatically maintains a high-relevance cache. With comprehensive experiments on three retrieval datasets, our experimental results demonstrate that ARC reduces storage requirements to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation Retrieval and Search Behavior · Topic Modeling · Multimodal Machine Learning Applications