Cache Mechanism for Agent RAG Systems
Shuhang Lin, Zhencan Peng, Lingyao Li, Xiao Lin, Xi Zhu, and Yongfeng Zhang

TL;DR
This paper introduces ARC, a caching framework for RAG systems that dynamically manages compact, high-relevance knowledge bases, significantly reducing storage and latency while improving answer accuracy in LLM agents.
Contribution
ARC is a novel, annotation-free cache management method that adapts to query patterns and embedding geometry, enhancing RAG system efficiency and effectiveness.
Findings
Reduces storage to 0.015% of original corpus
Increases has-answer rate up to 79.8%
Decreases retrieval latency by 80%
Abstract
Recent advances in Large Language Model (LLM)-based agents have been propelled by Retrieval-Augmented Generation (RAG), which grants the models access to vast external knowledge bases. Despite RAG's success in improving agent performance, agent-level cache management, particularly constructing, maintaining, and updating a compact, relevant corpus dynamically tailored to each agent's need, remains underexplored. Therefore, we introduce ARC (Agent RAG Cache Mechanism), a novel, annotation-free caching framework that dynamically manages small, high-value corpora for each agent. By synthesizing historical query distribution patterns with the intrinsic geometry of cached items in the embedding space, ARC automatically maintains a high-relevance cache. With comprehensive experiments on three retrieval datasets, our experimental results demonstrate that ARC reduces storage requirements to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Topic Modeling · Multimodal Machine Learning Applications
