Evaluating Parameter Efficient Methods for RLVR

Qingyu Yin; Yulun Wu; Zhennan Shen; Sunbowen Li; Zhilin Wang; Yanshu Li; Chak Tou Leong; Jiale Kang; Jinjin Gu

arXiv:2512.23165·cs.LG·January 1, 2026

Evaluating Parameter Efficient Methods for RLVR

Qingyu Yin, Yulun Wu, Zhennan Shen, Sunbowen Li, Zhilin Wang, Yanshu Li, Chak Tou Leong, Jiale Kang, Jinjin Gu

PDF

Open Access

TL;DR

This paper systematically evaluates various parameter-efficient fine-tuning methods for reinforcement learning with verifiable rewards, revealing that structural variants outperform standard approaches and highlighting issues with spectral initialization strategies.

Contribution

It provides the first comprehensive comparison of over 12 PEFT methods in RLVR, identifying superior architectures and analyzing failure modes of common strategies.

Findings

01

Structural variants like DoRA, AdaLoRA outperform LoRA.

02

Spectral collapse occurs in SVD-informed initialization methods.

03

Extreme parameter reduction hampers reasoning capacity.

Abstract

We systematically evaluate Parameter-Efficient Fine-Tuning (PEFT) methods under the paradigm of Reinforcement Learning with Verifiable Rewards (RLVR). RLVR incentivizes language models to enhance their reasoning capabilities through verifiable feedback; however, while methods like LoRA are commonly used, the optimal PEFT architecture for RLVR remains unidentified. In this work, we conduct the first comprehensive evaluation of over 12 PEFT methodologies across the DeepSeek-R1-Distill families on mathematical reasoning benchmarks. Our empirical results challenge the default adoption of standard LoRA with three main findings. First, we demonstrate that structural variants, such as DoRA, AdaLoRA, and MiSS, consistently outperform LoRA. Second, we uncover a spectral collapse phenomenon in SVD-informed initialization strategies (\textit{e.g.,} PiSSA, MiLoRA), attributing their failure to a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Machine Learning and Data Classification · Explainable Artificial Intelligence (XAI)