CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering
Yushi Sun, Lei Chen

TL;DR
CacheRAG introduces a cache-augmented architecture for LLM-based KGQA, improving retrieval efficiency and accuracy by leveraging semantic caching and diversity optimization.
Contribution
It presents a novel cache system with a semantic parsing framework, diversity-aware cache retrieval, and heuristic expansion to enhance LLM-based KGQA performance.
Findings
Achieves +13.2% accuracy on CRAG dataset
Improves truthfulness by +17.5%
Outperforms state-of-the-art baselines
Abstract
The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
