Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection

Andrey Pustovit

arXiv:2604.03270·cs.CL·April 7, 2026

Knowledge Packs: Zero-Token Knowledge Delivery via KV Cache Injection

Andrey Pustovit

PDF

TL;DR

Knowledge Packs utilize pre-computed key-value caches to deliver knowledge without token costs, enabling efficient and steerable knowledge delivery in causal transformers.

Contribution

This paper introduces Knowledge Packs, a method for zero-token knowledge delivery via KV cache injection, with demonstrated savings and behavioral steering capabilities.

Findings

01

Zero divergences across 700 questions with correct formatting

02

Up to 95% token savings in experiments

03

Behavioral steering achieved without training or weight modification

Abstract

RAG wastes tokens. We propose Knowledge Packs: pre-computed KV caches that deliver the same knowledge at zero token cost. For causal transformers, the KV cache from a forward pass on text F is identical to what a joint pass on F+q would produce - this follows directly from the causal mask. The equivalence is exact but fragile: wrong chat template formatting causes 6-7pp degradation, which we believe explains prior claims of KV outperforming RAG. With correct formatting: zero divergences across 700 questions on Qwen3-8B and Llama-3.1-8B, up to 95% token savings. The KV interface also enables behavioral steering that RAG cannot do. Because RoPE rotates keys but leaves values untouched, contrastive deltas on cached values can nudge model behavior while key arithmetic destroys coherence. The effect sits in mid-layer values (33-66%), independent directions are nearly orthogonal (cos~0) and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.