CDQuant: Greedy Coordinate Descent for Accurate LLM Quantization
Pranav Ajit Nair, Arun Sai Suggala

TL;DR
CDQuant is a scalable greedy coordinate descent algorithm for LLM quantization that outperforms GPTQ and enhances existing PTQ methods, enabling efficient compression of billion-parameter models with minimal performance loss.
Contribution
We introduce CDQuant, a simple, scalable, and more effective alternative to GPTQ for post-training quantization of large language models.
Findings
CDQuant outperforms GPTQ in 2-4 bit weight quantization.
CDQuant improves the performance of PTQ techniques like QuIP and FrameQuant.
CDQuant scales efficiently to models with hundreds of billions of parameters.
Abstract
Large language models (LLMs) have recently demonstrated remarkable performance across diverse language tasks. But their deployment is often constrained by their substantial computational and storage requirements. Quantization has emerged as a key technique for addressing this challenge, enabling the compression of large models with minimal impact on performance. The recent GPTQ algorithm, a post-training quantization (PTQ) method, has proven highly effective for compressing LLMs, sparking a wave of research that leverages GPTQ as a core component. Recognizing the pivotal role of GPTQ in the PTQ landscape, we introduce CDQuant, a simple and scalable alternative to GPTQ with improved performance. CDQuant uses greedy coordinate descent to minimize the layer-wise reconstruction loss to achieve high-quality quantized weights. Our algorithm is easy to implement and scales efficiently to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification · Machine Learning in Healthcare · Domain Adaptation and Few-Shot Learning
