IsoQuant: Hardware-Aligned SO(4) Isoclinic Rotations for LLM KV Cache Compression

Zhongping Ji

arXiv:2603.28430·cs.LG·March 31, 2026

IsoQuant: Hardware-Aligned SO(4) Isoclinic Rotations for LLM KV Cache Compression

Zhongping Ji

PDF

TL;DR

IsoQuant introduces a quaternion-based, hardware-aligned rotation framework for efficient low-bit vector quantization in large language model key-value cache compression, achieving significant speedups with minimal accuracy loss.

Contribution

It proposes a novel quaternion algebra and isoclinic decomposition approach for hardware-efficient SO(4) rotations, improving upon prior dense orthogonal transforms.

Findings

01

IsoQuant-Full reduces rotation cost by over 50% compared to RotorQuant.

02

Achieves 4.5x to 4.7x kernel speedups over RotorQuant.

03

Maintains comparable reconstruction MSE with peak speedups above 6x.

Abstract

Orthogonal feature decorrelation is effective for low-bit online vector quantization, but dense random orthogonal transforms incur prohibitive $O (d^{2})$ storage and compute. RotorQuant reduces this cost with blockwise $3$ D Clifford rotors, yet the resulting $3$ D partition is poorly aligned with modern hardware and offers limited local mixing. We propose \textbf{IsoQuant}, a blockwise rotation framework based on quaternion algebra and the isoclinic decomposition of $S O (4)$ . It represents each $4$ D block as a quaternion and applies a closed-form transform $T (v) = q_{L} v \overline{q_{R}}$ . This yields two main variants: \emph{IsoQuant-Full}, which realizes the full $S O (4)$ rotation, and \emph{IsoQuant-Fast}, which keeps only one isoclinic factor for lower cost; the framework also admits a lightweight $2$ D special case. At $d = 128$ , IsoQuant-Full reduces forward rotation cost from about…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.