CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

Yushi Sun; Lei Chen

arXiv:2604.26176·cs.DB·April 30, 2026

CacheRAG: A Semantic Caching System for Retrieval-Augmented Generation in Knowledge Graph Question Answering

Yushi Sun, Lei Chen

PDF

TL;DR

CacheRAG introduces a cache-augmented architecture for LLM-based KGQA, improving retrieval efficiency and accuracy by leveraging semantic caching and diversity optimization.

Contribution

It presents a novel cache system with a semantic parsing framework, diversity-aware cache retrieval, and heuristic expansion to enhance LLM-based KGQA performance.

Findings

01

Achieves +13.2% accuracy on CRAG dataset

02

Improves truthfulness by +17.5%

03

Outperforms state-of-the-art baselines

Abstract

The integration of Large Language Models (LLMs) with Retrieval-Augmented Generation (RAG) has significantly advanced Knowledge Graph Question Answering (KGQA). However, existing LLM-driven KGQA systems act as stateless planners, generating retrieval plans in isolation without exploiting historical query patterns: analogous to a database system that optimizes every query from scratch without a plan cache. This fundamental design flaw leads to schema hallucinations and limited retrieval coverage. We propose CacheRAG, a systematic cache-augmented architecture for LLM-based KGQA that transforms stateless planners into continual learners. Unlike traditional database plan caching (which optimizes for frequency), CacheRAG introduces three novel design principles tailored for LLM contexts: (1) Schema-agnostic user interface: A two-stage semantic parsing framework via Intermediate Semantic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.