Leech Lattice Vector Quantization for Efficient LLM Compression

Tycho F. A. van der Ouderaa; Mart van Baalen; Paul Whatmough; Markus Nagel

arXiv:2603.11021·cs.LG·March 12, 2026

Leech Lattice Vector Quantization for Efficient LLM Compression

Tycho F. A. van der Ouderaa, Mart van Baalen, Paul Whatmough, Markus Nagel

PDF

Open Access

TL;DR

This paper introduces Leech Lattice Vector Quantization (LLVQ), a practical and high-performing method for compressing large language models using the optimal 24-dimensional Leech lattice, surpassing recent quantization techniques.

Contribution

The paper develops a practical LLVQ algorithm supporting indexing, angular search, and parallel dequantization, enabling efficient LLM compression with state-of-the-art results.

Findings

01

LLVQ outperforms recent quantization methods like Quip#, QTIP, and PVQ.

02

The approach leverages the optimal properties of the Leech lattice for high-dimensional vector quantization.

03

The algorithm supports efficient indexing and parallel dequantization for scalable model compression.

Abstract

Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters jointly, practical implementations must avoid the need for expensive lookup mechanisms or other explicit codebook storage. Lattice approaches address this through highly structured and dense packing. This paper explores the Leech lattice, which, with its optimal sphere packing and kissing configurations at 24 dimensions, is the highest dimensional lattice known with such optimal properties. To make the Leech lattice usable for LLM quantization, we extend an existing search algorithm based on the extended Golay code construction, to i) support indexing, enabling conversion to and from bitstrings without materializing the codebook, ii) allow angular search over union of Leech lattice shells,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Algorithms and Data Compression