ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li, Osama Hanna, Christina Fragouli, Suhas Diggavi

TL;DR
ICQuant introduces an efficient index coding scheme leveraging outlier statistics to significantly improve low-bit quantization of LLM weights, enabling high accuracy with minimal bit overhead.
Contribution
The paper proposes ICQuant, a novel outlier-aware quantization framework that reduces bit overhead and enhances low-bit LLM quantization without fine-tuning.
Findings
ICQuant requires only 0.3 bits per weight for outlier suppression.
ICQuant improves zero-shot accuracy of 2-bit Llama3-70B by up to 150%.
ICQuant achieves comparable performance to fine-tuned quantizers without additional training.
Abstract
The rapid deployment of Large Language Models (LLMs) highlights the need for efficient low-bit post-training quantization (PTQ), due to their high memory costs. A key challenge in weight quantization is the presence of outliers, which inflate quantization ranges and lead to large errors. While a number of outlier suppression techniques have been proposed, they either: fail to effectively shrink the quantization range, or incur (relatively) high bit overhead. In this paper, we present ICQuant, a novel framework that leverages outlier statistics to design an efficient index coding scheme for outlier-aware weight-only quantization. Compared to existing outlier suppression techniques requiring bit overhead to halve the quantization range, ICQuant requires only bits; a significant saving in extreme compression regimes (e.g., 2-3 bits per weight). ICQuant can be used…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Data Compression Techniques · Advanced Data Storage Technologies · Error Correcting Code Techniques
