KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs

Chuangtao Chen; Grace Li Zhang; Xunzhao Yin; Cheng Zhuo; Bing Li; Ulf Schlichtmann

arXiv:2604.13226·cs.LG·April 20, 2026

KV Packet: Recomputation-Free Context-Independent KV Caching for LLMs

Chuangtao Chen, Grace Li Zhang, Xunzhao Yin, Cheng Zhuo, Bing Li, Ulf Schlichtmann

PDF

1 Repo

TL;DR

KV Packet introduces a recomputation-free caching method for LLMs that uses immutable document packets and soft-token adapters, reducing latency and FLOPs while maintaining accuracy.

Contribution

It presents a novel cache reuse framework that eliminates recomputation by treating cached documents as immutable packets with trainable adapters.

Findings

01

Achieves near-zero FLOPs compared to recomputation methods.

02

Reduces Time-to-First-Token (TTFT) latency.

03

Maintains F1 scores comparable to full recomputation baselines.

Abstract

Large Language Models (LLMs) rely heavily on Key-Value (KV) caching to minimize inference latency. However, standard KV caches are context-dependent: reusing a cached document in a new context requires recomputing KV states to account for shifts in attention distribution. Existing solutions such as CacheBlend, EPIC, and SAM-KV mitigate this issue by selectively recomputing a subset of tokens; however, they still incur non-negligible computational overhead (FLOPs) and increased Time-to-First-Token (TTFT) latency. In this paper, we propose KV Packet, a recomputation-free cache reuse framework that treats cached documents as immutable ``packets'' wrapped in light-weight trainable soft-token adapters, which are trained via self-supervised distillation to bridge context discontinuities. Experiments on Llama-3.1 and Qwen2.5 demonstrate that the proposed KV Packet method achieves near-zero…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

chuangtaochen-tum/KVPacket
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.