QuantLRM: Quantization of Large Reasoning Models via Fine-Tuning Signals
Nan Zhang, Eugene Kwek, Yusen Zhang, Muyu Pan, Suhang Wang, Prasenjit Mitra, Rui Zhang

TL;DR
QuantLRM introduces a novel weight quantization method for large reasoning models that leverages fine-tuning signals, improving compression efficiency and performance across multiple benchmarks.
Contribution
This paper proposes QuantLRM, a new quantization approach using weight update signals during fine-tuning, with a channel importance measure that outperforms existing methods.
Findings
QuantLRM improves quantization performance by an average of 6.55% on RL fine-tuned models.
The method is effective across various fine-tuning types and reasoning benchmarks.
Pseudo-fine-tuning signals enable QuantLRM to work well even without actual fine-tuning.
Abstract
Weight-only quantization is important for compressing Large Language Models (LLMs). Inspired by the spirit of classical magnitude pruning, we study whether the magnitude of weight updates during reasoning-incentivized fine-tuning can provide valuable signals for quantizing Large Reasoning Models (LRMs). We hypothesize that the smallest and largest weight updates during fine-tuning are more important than those of intermediate magnitude, a phenomenon we term "protecting both ends". Upon hypothesis validation, we introduce QuantLRM, which stands for weight quantization of LRMs via fine-tuning signals. We fit simple restricted quadratic functions on weight updates to protect both ends. By multiplying the average quadratic values with the count of zero weight updates of channels, we compute channel importance that is more effective than using activation or second-order information. We run…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Topic Modeling · Multimodal Machine Learning Applications
