ICQuant: Index Coding enables Low-bit LLM Quantization

Xinlin Li; Osama Hanna; Christina Fragouli; Suhas Diggavi

arXiv:2505.00850·cs.LG·August 26, 2025

ICQuant: Index Coding enables Low-bit LLM Quantization

Xinlin Li, Osama Hanna, Christina Fragouli, Suhas Diggavi

PDF

Open Access

TL;DR

ICQuant introduces an efficient index coding scheme leveraging outlier statistics to significantly improve low-bit quantization of LLM weights, enabling high accuracy with minimal bit overhead.

Contribution

The paper proposes ICQuant, a novel outlier-aware quantization framework that reduces bit overhead and enhances low-bit LLM quantization without fine-tuning.

Findings

01

ICQuant requires only 0.3 bits per weight for outlier suppression.

02

ICQuant improves zero-shot accuracy of 2-bit Llama3-70B by up to 150%.

03

ICQuant achieves comparable performance to fine-tuned quantizers without additional training.

Abstract

The rapid deployment of Large Language Models (LLMs) highlights the need for efficient low-bit post-training quantization (PTQ), due to their high memory costs. A key challenge in weight quantization is the presence of outliers, which inflate quantization ranges and lead to large errors. While a number of outlier suppression techniques have been proposed, they either: fail to effectively shrink the quantization range, or incur (relatively) high bit overhead. In this paper, we present ICQuant, a novel framework that leverages outlier statistics to design an efficient index coding scheme for outlier-aware weight-only quantization. Compared to existing outlier suppression techniques requiring $\approx 1$ bit overhead to halve the quantization range, ICQuant requires only $\approx 0.3$ bits; a significant saving in extreme compression regimes (e.g., 2-3 bits per weight). ICQuant can be used…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Data Compression Techniques · Advanced Data Storage Technologies · Error Correcting Code Techniques