eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization

Pei-Chun Su

arXiv:2605.02905·cs.LG·May 6, 2026

eOptShrinkQ: Near-Lossless KV Cache Compression Through Optimal Spectral Denoising and Quantization

Pei-Chun Su

PDF

TL;DR

eOptShrinkQ is a novel two-stage compression method for transformer KV caches that combines spectral denoising and quantization, achieving near-lossless compression and improved retrieval performance.

Contribution

It introduces a spectral denoising-based compression pipeline with theoretical guarantees, outperforming existing quantization methods in transformer models.

Findings

01

eOptShrinkQ saves nearly one bit per entry over TurboQuant at similar quality.

02

It outperforms TurboQuant at 2.2 bits per entry on LongBench tasks.

03

Spectral denoising acts as a regularizer, enhancing retrieval tasks.

Abstract

We show that the key-value (KV) cache in transformer attention heads admits a natural decomposition into a low-rank \emph{shared context} component and a full-rank \emph{per-token} residual, well described by the spiked random matrix model. This observation leads to eOptShrinkQ, a two-stage compression pipeline: optimal singular value shrinkage (eOptShrink) automatically extracts the shared structure, and the residual -- which satisfies the \emph{thin shell property} with delocalized coordinates -- is quantized by TurboQuant~\citep{zandieh2025turboquant}, a recently proposed per-vector scalar quantizer with near-optimal distortion guarantees. By restoring the isotropy that scalar quantization assumes, spectral denoising eliminates the need for both outlier handling and dedicated inner product bias correction, freeing those bits for improved reconstruction. The theoretical grounding in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.