QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals

Nan Zhang; Eugene Kwek; Yusen Zhang; Muyu Pan; Suhang Wang; Prasenjit Mitra; Rui Zhang

arXiv:2602.02581·cs.LG·February 4, 2026

QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals

Nan Zhang, Eugene Kwek, Yusen Zhang, Muyu Pan, Suhang Wang, Prasenjit Mitra, Rui Zhang

PDF

Open Access 5 Models

TL;DR

QuantLRM introduces a novel weight quantization method for large reasoning models that leverages fine-tuning signals, improving compression efficiency and performance across multiple benchmarks.

Contribution

This paper proposes QuantLRM, a new quantization approach using weight update signals during fine-tuning, with a channel importance measure that outperforms existing methods.

Findings

01

QuantLRM improves quantization performance by an average of 6.55% on RL fine-tuned models.

02

The method is effective across various fine-tuning types and reasoning benchmarks.

03

Pseudo-fine-tuning signals enable QuantLRM to work well even without actual fine-tuning.

Abstract

Weight-only quantization is important for compressing Large Language Models (LLMs). Inspired by the spirit of classical magnitude pruning, we study whether the magnitude of weight updates during reasoning-incentivized fine-tuning can provide valuable signals for quantizing Large Reasoning Models (LRMs). We hypothesize that the smallest and largest weight updates during fine-tuning are more important than those of intermediate magnitude, a phenomenon we term "protecting both ends". Upon hypothesis validation, we introduce QuantLRM, which stands for weight quantization of LRMs via fine-tuning signals. We fit simple restricted quadratic functions on weight updates to protect both ends. By multiplying the average quadratic values with the count of zero weight updates of channels, we compute channel importance that is more effective than using activation or second-order information. We run…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications