Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

Paolo D'Alberto

arXiv:2605.08114·cs.LG·May 12, 2026

Statistical Inference and Quality Measures of KV Cache Quantisations Inspired by TurboQuant

Paolo D'Alberto

PDF

TL;DR

This paper analyzes three KV cache quantization schemes, revealing their relative performance across different budgets and distributions, and introduces statistical inference and information metrics to understand their practical differences.

Contribution

It provides a detailed analysis of KV cache quantization schemes, highlighting the conditions under which each scheme performs best and explaining the impact of Jensen's inequality on quantization errors.

Findings

01

KQV outperforms other schemes at a budget of 4 across all tested measures.

02

QKQV is consistently worse than KQV in KL divergence across budgets and distributions.

03

A budget-dependent crossover exists where QKQV outperforms KQV at certain budgets, revealing an open rate-distortion problem.

Abstract

We analyse three KV cache quantization schemes under a fair bit budget: \textbf{KV} (scalar MSE baseline), \textbf{KQV} (WHT + MSE on $K$ ; WHT + MSE + QJL on $V$ ), and \textbf{QKQV} (WHT + MSE + QJL on both). Starting from the Beta distribution on the hypersphere, we trace how QJL on $K$ inflates inner product variance by $π /2$ , which softmax amplifies nonlinearly via Jensen's inequality, and we present statistical inference and information metrics to highlight practical differences. Three empirical findings emerge. (1)~At $n = 4$ (the practically dominant budget), KQV wins on every measure -- KL divergence, geometric $K$ error, and 6D distance -- across all distributions and ranks tested. (2)~The K--V asymmetry is unconditional: QKQV is consistently worse than KQV in KL divergence at every budget and distribution. (3)~A budget-dependent crossover exists: QKQV achieves better…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.