KVCache Cache in the Wild: Characterizing and Optimizing KVCache Cache at a Large Cloud Provider
Jiahao Wang, Jinbo Han, Xingda Wei, Sijie Shen, Dingyan Zhang, Chenguang Fang, Rong Chen, Wenyuan Yu, Haibo Chen

TL;DR
This paper systematically characterizes KV$ workload patterns in large language model serving at a major cloud provider, revealing insights that enable workload-aware cache eviction policies to improve performance.
Contribution
It provides the first detailed analysis of real-world KV$ workloads in LLM serving, informing cache management strategies for better efficiency.
Findings
KV$ reuses are skewed across requests
Reuse patterns are predictable within request categories
Moderate cache size suffices for high hit ratios
Abstract
Serving large language models (LLMs) is important for cloud providers, and caching intermediate results (KV$) after processing each request substantially improves serving throughput and latency. However, there is limited understanding of how LLM serving benefits from KV$ caching, where system design decisions like cache eviction policies are highly workload-dependent. In this paper, we present the first systematic characterization of the KV$ workload patterns from one of the leading LLM service providers. We draw observations that were not covered by previous studies focusing on synthetic workloads, including: KV$ reuses are skewed across requests, where reuses between single-turn requests are equally important as multi-turn requests; the reuse time and probability are diverse considering all requests, but for a specific request category, the pattern tends to be predictable; and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCaching and Content Delivery · Advanced Data Storage Technologies
Methodstravel james
