Leech Lattice Vector Quantization for Efficient LLM Compression
Tycho F. A. van der Ouderaa, Mart van Baalen, Paul Whatmough, Markus Nagel

TL;DR
This paper introduces Leech Lattice Vector Quantization (LLVQ), a practical and high-performing method for compressing large language models using the optimal 24-dimensional Leech lattice, surpassing recent quantization techniques.
Contribution
The paper develops a practical LLVQ algorithm supporting indexing, angular search, and parallel dequantization, enabling efficient LLM compression with state-of-the-art results.
Findings
LLVQ outperforms recent quantization methods like Quip#, QTIP, and PVQ.
The approach leverages the optimal properties of the Leech lattice for high-dimensional vector quantization.
The algorithm supports efficient indexing and parallel dequantization for scalable model compression.
Abstract
Scalar quantization of large language models (LLMs) is fundamentally limited by information-theoretic bounds. While vector quantization (VQ) overcomes these limits by encoding blocks of parameters jointly, practical implementations must avoid the need for expensive lookup mechanisms or other explicit codebook storage. Lattice approaches address this through highly structured and dense packing. This paper explores the Leech lattice, which, with its optimal sphere packing and kissing configurations at 24 dimensions, is the highest dimensional lattice known with such optimal properties. To make the Leech lattice usable for LLM quantization, we extend an existing search algorithm based on the extended Golay code construction, to i) support indexing, enabling conversion to and from bitstrings without materializing the codebook, ii) allow angular search over union of Leech lattice shells,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Speech Recognition and Synthesis · Algorithms and Data Compression
