Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Xi Zhang; Xiaolin Wu; Jiamang Wang; Weisi Lin

arXiv:2510.20984·cs.LG·January 27, 2026

Learning Grouped Lattice Vector Quantizers for Low-Bit LLM Compression

Xi Zhang, Xiaolin Wu, Jiamang Wang, Weisi Lin

PDF

TL;DR

This paper proposes a novel Grouped Lattice Vector Quantization method for low-bit LLM compression, improving the accuracy-size trade-off in post-training quantization by learning customized lattice codebooks.

Contribution

Introduces a learnable lattice codebook framework with Babai rounding for stable training, enhancing low-bit LLM quantization performance.

Findings

01

Outperforms existing PTQ baselines in accuracy and size trade-off

02

Enables efficient decoding via matrix-vector multiplication

03

Demonstrates effectiveness across multiple benchmarks

Abstract

Large Language Models (LLMs) have demonstrated remarkable capabilities but typically require extensive computational resources and memory for inference. Post-training quantization (PTQ) can effectively reduce these demands by storing weights in lower bit-width formats. However, standard uniform quantization often leads to notable performance degradation, particularly in low-bit scenarios. In this work, we introduce a Grouped Lattice Vector Quantization (GLVQ) framework that assigns each group of weights a customized lattice codebook, defined by a learnable generation matrix. To address the non-differentiability of the quantization process, we adopt Babai rounding to approximate nearest-lattice-point search during training, which enables stable optimization of the generation matrices. Once trained, decoding reduces to a simple matrix-vector multiplication, yielding an efficient and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.