QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Jerry Chee; Yaohui Cai; Volodymyr Kuleshov; Christopher De Sa

arXiv:2307.13304·cs.LG·January 17, 2024·24 cites

QuIP: 2-Bit Quantization of Large Language Models With Guarantees

Jerry Chee, Yaohui Cai, Volodymyr Kuleshov, Christopher De Sa

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

This paper introduces QuIP, a novel 2-bit quantization method for large language models that guarantees performance improvements through incoherence processing, supported by theoretical analysis and empirical results.

Contribution

We propose QuIP, a new 2-bit quantization technique with incoherence processing, and provide the first theoretical analysis for LLM-scale quantization algorithms.

Findings

01

Incoherence preprocessing improves existing quantization methods.

02

QuIP achieves viable 2-bit quantization results for large language models.

03

Theoretical analysis applies to QuIP and existing methods like OPTQ.

Abstract

This work studies post-training parameter quantization in large language models (LLMs). We introduce quantization with incoherence processing (QuIP), a new method based on the insight that quantization benefits from $incoherent$ weight and Hessian matrices, i.e., from the weights being even in magnitude and the directions in which it is important to round them accurately being unaligned with the coordinate axes. QuIP consists of two steps: (1) an adaptive rounding procedure minimizing a quadratic proxy objective; (2) efficient pre- and post-processing that ensures weight and Hessian incoherence via multiplication by random orthogonal matrices. We complement QuIP with the first theoretical analysis for an LLM-scale quantization algorithm, and show that our theory also applies to an existing method, OPTQ. Empirically, we find that our incoherence preprocessing improves several…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cornell-RelaxML/QuIP
pytorchOfficial

Models

🤗
Jakubrd4/Bielik-11B-v2.3-Instruct-QuIP-2bit
model· 3 dl
3 dl

Videos

QuIP: 2-Bit Quantization of Large Language Models With Guarantees· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis