Learned Structure in Cartridges: Keys as Shareable Routers in Self-Studied Representations
Maurizio Diaz

TL;DR
This paper investigates the structure of learned Cartridge key-value caches in large language models, revealing keys as shareable routers and value vectors as primary compression, with implications for efficient long-context inference.
Contribution
It provides the first mechanistic analysis of Cartridge cache structure, proposing keys as stable routers and introducing Sampled Chunk Initialization for faster convergence.
Findings
Cartridge keys act as shareable retrieval routers across tasks.
Most learned compression occurs within Cartridge value vectors.
Sampled Chunk Initialization improves Cartridge training speed.
Abstract
A bottleneck for long-context LLM inference is the linearly growing KV cache. Recent work has proposed Cartridges, an approach which leverages offline compute to train a much smaller KV cache than is typically required for a full document (up to 40x less memory usage at inference time). In this paper, we present the first mechanistic exploration of the learned Cartridge key-value cache structure. In particular, we propose that (1) Cartridge keys act as stable, shareable retrieval routers for the compressed corpora and (2) most of the learned compression occurs within the Cartridge value vectors. We present empirical evidence of our routing theory across tasks, model families, and model sizes; for example, we can ablate the learned Cartridge key vectors between tasks with little performance loss. Finally, we propose a slight improvement in initialization called Sampled Chunk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
