The Impact of Quantization on Large Reasoning Model Reinforcement Learning

Medha Kumar; Zifei Xu; Xin Wang; Tristan Webb

arXiv:2511.15694·cs.LG·November 20, 2025

The Impact of Quantization on Large Reasoning Model Reinforcement Learning

Medha Kumar, Zifei Xu, Xin Wang, Tristan Webb

PDF

Open Access

TL;DR

This paper investigates how different quantization methods affect the reasoning performance of large reasoning models trained with reinforcement learning, revealing that quantization-aware training hampers learning while post-training quantization preserves performance.

Contribution

It provides the first systematic analysis of quantization impacts on RL-trained large reasoning models, highlighting the advantages of post-training quantization over quantization-aware training.

Findings

01

Quantization-aware RL training reduces reasoning performance.

02

Post-training quantization maintains higher reasoning accuracy.

03

QLoRA with PTQ outperforms quantization-aware methods.

Abstract

Strong reasoning capabilities can now be achieved by large-scale reinforcement learning (RL) without any supervised fine-tuning. Although post-training quantization (PTQ) and quantization-aware training (QAT) are well studied in the context of fine-tuning, how quantization impacts RL in large reasoning models (LRMs) remains an open question. To answer this question, we conducted systematic experiments and discovered a significant gap in reasoning performance on mathematical benchmarks between post-RL quantized models and their quantization-aware RL optimized counterparts. Our findings suggest that quantization-aware RL training negatively impacted the learning process, whereas PTQ and QLoRA led to greater performance.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning