Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection
Andrey Pustovit

TL;DR
Knowledge Packs utilize pre-computed key-value caches to deliver knowledge without token costs, enabling efficient and steerable knowledge delivery in causal transformers.
Contribution
This paper introduces Knowledge Packs, a method for zero-token knowledge delivery via KV cache injection, with demonstrated savings and behavioral steering capabilities.
Findings
Zero divergences across 700 questions with correct formatting
Up to 95% token savings in experiments
Behavioral steering achieved without training or weight modification
Abstract
RAG wastes tokens. We propose Knowledge Packs: pre-computed KV caches that deliver the same knowledge at zero token cost. For causal transformers, the KV cache from a forward pass on text F is identical to what a joint pass on F+q would produce - this follows directly from the causal mask. The equivalence is exact but fragile: wrong chat template formatting causes 6-7pp degradation, which we believe explains prior claims of KV outperforming RAG. With correct formatting: zero divergences across 700 questions on Qwen3-8B and Llama-3.1-8B, up to 95% token savings. The KV interface also enables behavioral steering that RAG cannot do. Because RoPE rotates keys but leaves values untouched, contrastive deltas on cached values can nudge model behavior while key arithmetic destroys coherence. The effect sits in mid-layer values (33-66%), independent directions are nearly orthogonal (cos~0) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
