Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores

Vivek Chari; Benjamin Van Durme

arXiv:2507.08143·cs.CL·December 10, 2025

Compactor: Calibrated Query-Agnostic KV Cache Compression with Approximate Leverage Scores

Vivek Chari, Benjamin Van Durme

PDF

TL;DR

Compactor is a query-agnostic KV cache compression method for LLMs that uses approximate leverage scores to retain essential tokens, reducing memory usage by up to 68% while maintaining performance across diverse tasks.

Contribution

We introduce Compactor, a novel, training-free, query-agnostic compression strategy using leverage scores, with a context-calibrated procedure for optimal compression in LLMs.

Findings

01

Achieves 20% token retention with comparable performance to existing methods.

02

Reduces KV memory by 68% on Longbench with full performance.

03

Demonstrates effectiveness across 27 diverse tasks and models.

Abstract

Modern Large Language Models (LLMs) are increasingly trained to support very large context windows. We present Compactor, a training-free, query-agnostic KV compression strategy that uses approximate leverage scores to determine token importance. We show that Compactor can achieve the same performance as competing methods while retaining 20% fewer tokens in both synthetic and real-world context tasks, while being more task-robust. We further introduce a procedure for context-calibrated compression: inferring the maximum compression a given context supports before significant performance loss. Using context-calibrated compression, we show that Compactor achieves full KV performance on Longbench while reducing the KV memory burden by 68%, on average. To demonstrate the efficacy and generalizability of our approach, we apply Compactor to 27 synthetic and real-world tasks from RULER and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsLLaMA