What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study

Keyu Lv; Manyi Zhang; Xiaobo Xia; Jingchen Ni; Shannan Yan; Xianzhi Yu; Lu Hou; Chun Yuan; Haoli Bai

arXiv:2601.14888·cs.LG·January 22, 2026

What Makes Low-Bit Quantization-Aware Training Work for Reasoning LLMs? A Systematic Study

Keyu Lv, Manyi Zhang, Xiaobo Xia, Jingchen Ni, Shannan Yan, Xianzhi Yu, Lu Hou, Chun Yuan, Haoli Bai

PDF

Open Access

TL;DR

This paper systematically studies quantization-aware training for reasoning large language models, demonstrating how to improve accuracy and efficiency through knowledge distillation, domain alignment, and optimized workflows, outperforming existing methods.

Contribution

It introduces an optimized Reasoning-QAT workflow that enhances low-bit quantization for reasoning LLMs, combining insights on distillation, initialization, and domain alignment.

Findings

01

Knowledge distillation is robust for reasoning models.

02

PTQ provides strong initialization for QAT.

03

Reinforcement learning remains feasible with quantized models.

Abstract

Reasoning models excel at complex tasks such as coding and mathematics, yet their inference is often slow and token-inefficient. To improve the inference efficiency, post-training quantization (PTQ) usually comes with the cost of large accuracy drops, especially for reasoning tasks under low-bit settings. In this study, we present a systematic empirical study of quantization-aware training (QAT) for reasoning models. Our key findings include: (1) Knowledge distillation is a robust objective for reasoning models trained via either supervised fine-tuning or reinforcement learning; (2) PTQ provides a strong initialization for QAT, improving accuracy while reducing training cost; (3) Reinforcement learning remains feasible for quantized models given a viable cold start and yields additional gains; and (4) Aligning the PTQ calibration domain with the QAT training domain accelerates…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Machine Learning and Data Classification · Advanced Neural Network Applications