WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points

Dongyue Li; Zechun Liu; Kai Yi; Zhenshuo Zhang; Changsheng Zhao; Raghuraman Krishnamoorthi; Harshit Khaitan; Hongyang R. Zhang; Steven Li

arXiv:2605.17471·cs.LG·May 20, 2026

WinQ: Accelerating Quantization-Aware Training of Language Models Around Saddle Points

Dongyue Li, Zechun Liu, Kai Yi, Zhenshuo Zhang, Changsheng Zhao, Raghuraman Krishnamoorthi, Harshit Khaitan, Hongyang R. Zhang, Steven Li

PDF

TL;DR

This paper analyzes the convergence issues in quantization-aware training of language models and proposes WinQ, an algorithm that accelerates training and improves quantization performance, especially at low bit-widths.

Contribution

The paper introduces WinQ, a novel method that accelerates quantization-aware training by weight resetting and gradient regularization, leading to significant speedups and accuracy improvements.

Findings

01

WinQ accelerates QAT by up to 4 times across various models and methods.

02

WinQ improves sub-4-bit quantization accuracy by up to 8.8%.

03

Hessian spectrum analysis reveals weights converge to saddle points with eigenvalues near zero.

Abstract

Quantization-aware training (QAT) is widely adopted to quantize language models by training full-precision weights using gradients from the quantized model. The main bottleneck is its slow convergence and early performance plateau, particularly below 4-bit-widths. While this problem has been observed in prior work, its precise cause remains unclear. In this paper, we analyze the convergence of QAT by estimating the spectrum of the loss-surface Hessians. We find that the weights converge to flat regions around saddle points, where a large fraction of the Hessian eigenvalues are both positive and negative. During training, an increasing fraction of Hessian eigenvalues concentrates around zero, whose magnitude decreases. At lower bit-widths, the magnitude of eigenvalues in the Hessian spectrum is significantly smaller. To mitigate these issues, we propose an algorithm called WinQ to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.