CDQuant: Greedy Coordinate Descent for Accurate LLM Quantization

Pranav Ajit Nair; Arun Sai Suggala

arXiv:2406.17542·cs.LG·October 24, 2024

CDQuant: Greedy Coordinate Descent for Accurate LLM Quantization

Pranav Ajit Nair, Arun Sai Suggala

PDF

Open Access

TL;DR

CDQuant is a scalable greedy coordinate descent algorithm for LLM quantization that outperforms GPTQ and enhances existing PTQ methods, enabling efficient compression of billion-parameter models with minimal performance loss.

Contribution

We introduce CDQuant, a simple, scalable, and more effective alternative to GPTQ for post-training quantization of large language models.

Findings

01

CDQuant outperforms GPTQ in 2-4 bit weight quantization.

02

CDQuant improves the performance of PTQ techniques like QuIP and FrameQuant.

03

CDQuant scales efficiently to models with hundreds of billions of parameters.

Abstract

Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses greedy coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning