ProphetKV: User-Query-Driven Selective Recomputation for Efficient KV Cache Reuse in Retrieval-Augmented Generation
Shihao Wang, Jiahao Chen, Yanqi Pan, Hao Huang, Yichen Hao, Xiangyu Zou, Wen Xia, Wentao Zhang, Chongyang Qiu, and Pengfei Wang

TL;DR
ProphetKV is a user-query-driven method that selectively recomputes key-value caches in retrieval-augmented generation, significantly improving inference accuracy and efficiency by focusing on query-relevant tokens.
Contribution
It introduces a dynamic, query-focused token prioritization and a dual-stage recomputation pipeline to optimize cache reuse in RAG, reducing overhead while maintaining high accuracy.
Findings
Retains 96%-101% of full-prefill accuracy with only 20% recomputation.
Achieves 8.8%-24.9% accuracy improvement on RULER.
Achieves 18.6%-50.9% accuracy improvement on LongBench.
Abstract
The prefill stage of long-context Retrieval-Augmented Generation (RAG) is severely bottlenecked by computational overhead. To mitigate this, recent methods assemble pre-calculated KV caches of retrieved RAG documents (by a user query) and reprocess selected tokens to recover cross-attention between these pre-calculated KV caches. However, we identify a fundamental "crowding-out effect" in current token selection criteria: globally salient but user-query-irrelevant tokens saturate the limited recomputation budget, displacing the tokens truly essential for answering the user query and degrading inference accuracy. We propose ProphetKV, a user-query-driven KV Cache reuse method for RAG scenarios. ProphetKV dynamically prioritizes tokens based on their semantic relevance to the user query and employs a dual-stage recomputation pipeline to fuse layer-wise attention metrics into a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsInformation Retrieval and Search Behavior · Caching and Content Delivery · Advanced Data Storage Technologies
