Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs

Binxing Xu; Hao Gu; Lujun Li; Hao Wang; Bei Liu; Jiacheng Liu; Qiyuan Zhu; Xintong Yang; Chao Li; Sirui Han; Yike Guo

arXiv:2604.07888·cs.LG·April 10, 2026

Bit-by-Bit: Progressive QAT Strategy with Outlier Channel Splitting for Stable Low-Bit LLMs

Binxing Xu, Hao Gu, Lujun Li, Hao Wang, Bei Liu, Jiacheng Liu, Qiyuan Zhu, Xintong Yang, Chao Li, Sirui Han, Yike Guo

PDF

TL;DR

This paper introduces Bit-by-Bit, a progressive quantization-aware training framework with outlier channel splitting, enabling stable low-bit LLM training, multi-bit deployment, and efficient custom kernels, significantly improving performance and speed.

Contribution

The paper proposes a novel progressive QAT method with outlier channel splitting, supporting multi-bit deployment and custom kernels for stable, efficient ultra-low precision LLM training.

Findings

01

Outperforms baselines like BitDistiller and EfficientQAT on Llama models.

02

Achieves up to 11× speedup with custom 2-bit kernels.

03

Maintains low perplexity loss with only 2.25 PPL increase on WikiText2.

Abstract

Training LLMs at ultra-low precision remains a formidable challenge. Direct low-bit QAT often suffers from convergence instability and substantial training costs, exacerbated by quantization noise from heavy-tailed outlier channels and error accumulation across layers. To address these issues, we present Bit-by-Bit, a progressive QAT framework with outlier channel splitting. Our approach integrates three key components: (1) block-wise progressive training that reduces precision stage by stage, ensuring stable initialization for low-bit optimization; (2) nested structure of integer quantization grids to enable a "train once, deploy any precision" paradigm, allowing a single model to support multiple bit-widths without retraining; (3) rounding-aware outlier channel splitting, which mitigates quantization error while acting as an identity transform that preserves the quantized outputs.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.