StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths

Tianyi Chen; Sihan Chen; Xiaoyi Qu; Dan Zhao; Ruomei Yan; Jongwoo Ko; Luming Liang; Pashmina Cameron

arXiv:2601.19320·cs.LG·February 19, 2026

StableQAT: Stable Quantization-Aware Training at Ultra-Low Bitwidths

Tianyi Chen, Sihan Chen, Xiaoyi Qu, Dan Zhao, Ruomei Yan, Jongwoo Ko, Luming Liang, Pashmina Cameron

PDF

Open Access

TL;DR

StableQAT introduces a theoretically grounded surrogate for backpropagation in quantization-aware training, significantly enhancing stability and performance at ultra-low bitwidths with minimal overhead.

Contribution

It proposes a novel surrogate derived from Fourier analysis that generalizes STE, improving stability and efficiency in ultra-low bitwidth QAT.

Findings

01

StableQAT achieves stable training at 2-4 bits.

02

It outperforms standard QAT techniques in robustness and accuracy.

03

Training overhead remains negligible.

Abstract

Quantization-aware training (QAT) is essential for deploying large models under strict memory and latency constraints, yet achieving stable and robust optimization at ultra-low bitwidths remains challenging. Common approaches based on the straight-through estimator (STE) or soft quantizers often suffer from gradient mismatch, instability, or high computational overhead. As such, we propose StableQAT, a unified and efficient QAT framework that stabilizes training in ultra low-bit settings via a novel, lightweight, and theoretically grounded surrogate for backpropagation derived from a discrete Fourier analysis of the rounding operator. StableQAT strictly generalizes STE as the latter arises as a special case of our more expressive surrogate family, yielding smooth, bounded, and inexpensive gradients that improve QAT training performance and stability across various hyperparameter…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Advanced Data Compression Techniques · Domain Adaptation and Few-Shot Learning