TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization

Dipkumar Patel

arXiv:2603.27467·cs.LG·March 31, 2026

TurboAngle: Near-Lossless KV Cache Compression via Uniform Angle Quantization

Dipkumar Patel

PDF

TL;DR

TurboAngle introduces a near-lossless KV cache compression method using uniform angle quantization in the Walsh-Hadamard domain, with per-layer optimization for different model sizes, achieving high compression with minimal quality loss.

Contribution

It proposes a novel angle quantization technique combined with per-layer early-boost for efficient, near-lossless KV cache compression across large language models.

Findings

01

Achieves lossless compression on four models and near-lossless on six of seven models.

02

Uses 3.28 to 3.67 angle bits per element, with asymmetric quantization for keys and values.

03

Reveals model-specific bottleneck patterns through layer-group sensitivity analysis.

Abstract

We compress KV cache entries by quantizing angles in the Fast Walsh-Hadamard domain, where a random diagonal rotation makes consecutive element pairs approximately uniformly distributed on the unit circle. We extend this angular quantizer with per-layer early-boost, which independently configures K and V codebook sizes at each layer, allocating higher precision to a model-specific subset of critical layers. Across seven models (1B to 7B parameters), per-layer early-boost achieves lossless compression on four models and near-lossless quality on six of seven, at 3.28 to 3.67 angle bits per element. Asymmetric norm quantization (8-bit for keys, 4-bit log-space for values) yields 6.56 total bits per element on Mistral-7B with perplexity degradation of +0.0014 and no calibration data. A layer-group sensitivity analysis reveals model-specific bottleneck patterns, including K-dominated versus…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.