Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression
Xi Zhang, Xiaolin Wu, Jiamang Wang, Weisi Lin

TL;DR
This paper proposes a novel Grouped Lattice Vector Quantization method for low-bit LLM compression, improving the accuracy-size trade-off in post-training quantization by learning customized lattice codebooks.
Contribution
Introduces a learnable lattice codebook framework with Babai rounding for stable training, enhancing low-bit LLM quantization performance.
Findings
Outperforms existing PTQ baselines in accuracy and size trade-off
Enables efficient decoding via matrix-vector multiplication
Demonstrates effectiveness across multiple benchmarks
Abstract
Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing weights in lower bit-width formats. However, standard uniform quantization often leads to notable performance degradation, particularly in low-bit scenarios. In this work, we introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook, defined by a learnable generation matrix. To address the non-differentiability of the quantization process, we adopt Babai rounding to approximate nearest-lattice-point search during training, which enables stable optimization of the generation matrices. Once trained, decoding reduces to a simple matrix-vector multiplication, yielding an efficient and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
