CompilerKV: Risk-Adaptive KV Compression via Offline Experience Compilation
Ning Yang, Chengzhi Wang, Yibo Liu, Baoliang Tian, Haijun Zhang

TL;DR
CompilerKV introduces a risk-adaptive key-value compression method that compiles retention policies offline, enabling efficient, portable, and state-of-the-art compression performance across multiple language model architectures.
Contribution
It proposes an offline compiled retention policy for KV compression, reducing online correction to simple lookups and demonstrating high transferability and superior performance.
Findings
Achieves state-of-the-art compression on four backbones at 512 tokens.
Retention tables transfer effectively across different corpora and models.
Maintains strong performance under various pressure regimes and cache ratios.
Abstract
Prefill-only KV compression freezes a token subset at the end of prefill and decodes from it without further eviction. The retention decision is therefore irreversible, yet existing methods estimate the corrective signals it relies on, per-head reliability and prompt-level compression sensitivity, online from a single noisy prompt. We argue this is the wrong statistical unit: these signals exhibit far higher cross-prompt regularity than within-prompt signal-to-noise. We introduce \textsc{CompilerKV}, a KV-retention policy whose corrective tables are compiled offline from a calibration corpus, reducing online correction after the standard observation-window scan to lookups plus a budget clamp. We find that compiled retention tables behave as portable architectural priors: rankings transfer across disjoint corpora on four backbones (mean Spearman ), and direct…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBig Data and Digital Economy · Parallel Computing and Optimization Techniques · Cloud Computing and Resource Management
