QuIP#: Even Better LLM Quantization with Hadamard Incoherence and   Lattice Codebooks

Albert Tseng; Jerry Chee; Qingyao Sun; Volodymyr Kuleshov; Christopher; De Sa

arXiv:2402.04396·cs.LG·June 5, 2024·1 cites

QuIP#: Even Better LLM Quantization with Hadamard Incoherence and Lattice Codebooks

Albert Tseng, Jerry Chee, Qingyao Sun, Volodymyr Kuleshov, Christopher, De Sa

PDF

Open Access 3 Repos

TL;DR

QuIP# is a novel post-training quantization method for large language models that employs Hadamard transforms, lattice codebooks, and fine-tuning to achieve state-of-the-art compression at extremely low bit-widths, enabling efficient and accurate inference.

Contribution

Introduces QuIP#, a PTQ technique combining Hadamard incoherence, E8 lattice codebooks, and fine-tuning for superior LLM weight compression.

Findings

01

Outperforms existing PTQ methods in extreme compression regimes.

02

Enables fast inference with minimal accuracy loss.

03

Supports new behaviors in PTQ scaling.

Abstract

Post-training quantization (PTQ) reduces the memory footprint of LLMs by quantizing their weights to low-precision. In this work, we introduce QuIP#, a weight-only PTQ method that achieves state-of-the-art results in extreme compression regimes ( $\leq$ 4 bits per weight) using three novel techniques. First, QuIP# improves QuIP's (Chee et al., 2023) incoherence processing by using the randomized Hadamard transform, which is faster and has better theoretical properties. Second, QuIP# uses vector quantization to take advantage of the ball-shaped sub-Gaussian distribution that incoherent weights possess: specifically, we introduce a set of hardware-efficient codebooks based on the highly symmetric $E_{8}$ lattice, which achieves the optimal 8-dimension unit ball packing. Third, QuIP# uses fine-tuning to improve fidelity to the original model. Our experiments show that QuIP# outperforms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCryptography and Data Security

MethodsSparse Evolutionary Training