QuantEase: Optimization-based Quantization for Language Models

Kayhan Behdin; Ayan Acharya; Aman Gupta; Qingquan Song; Siyu Zhu,; Sathiya Keerthi; Rahul Mazumder

arXiv:2309.01885·stat.ML·December 4, 2023·1 cites

QuantEase: Optimization-based Quantization for Language Models

Kayhan Behdin, Ayan Acharya, Aman Gupta, Qingquan Song, Siyu Zhu,, Sathiya Keerthi, Rahul Mazumder

PDF

Open Access

TL;DR

QuantEase introduces a layer-wise, optimization-based quantization framework for large language models, achieving state-of-the-art compression with minimal accuracy loss and efficient GPU-based implementation.

Contribution

The paper presents a novel coordinate descent algorithm for layer-wise quantization of LLMs, including an outlier-aware variant, enabling near or sub-3-bit quantization with high performance.

Findings

01

Achieves up to 15% improvement in perplexity and zero-shot accuracy over GPTQ.

02

Quantizes models like Falcon-180B in about 3 hours on a single GPU.

03

Outlier-aware approach reaches near or sub-3-bit quantization with minimal accuracy drop.

Abstract

With the rising popularity of Large Language Models (LLMs), there has been an increasing interest in compression techniques that enable their efficient deployment. This study focuses on the Post-Training Quantization (PTQ) of LLMs. Drawing from recent advances, our work introduces QuantEase, a layer-wise quantization framework where individual layers undergo separate quantization. The problem is framed as a discrete-structured non-convex optimization, prompting the development of algorithms rooted in Coordinate Descent (CD) techniques. These CD-based methods provide high-quality solutions to the complex non-convex layer-wise quantization problems. Notably, our CD-based approach features straightforward updates, relying solely on matrix and vector operations, circumventing the need for matrix inversion or decomposition. We also explore an outlier-aware variant of our approach, allowing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis