Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Minwu Kim; Anubhav Shrestha; Safal Shrestha; Aadim Nepal; Keith Ross

arXiv:2505.14216·cs.AI·November 3, 2025

Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning

Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross

PDF

Open Access 1 Repo

TL;DR

This paper compares reinforcement learning with verifiable rewards and distillation in large language models, revealing their distinct impacts on accuracy and reasoning capability, and providing insights into their mechanisms and effects.

Contribution

It uncovers how RLVR and distillation differently influence LLM reasoning, highlighting limitations and mechanisms behind their effects on accuracy and capability.

Findings

01

RLVR improves accuracy but struggles with capability on difficult questions.

02

RLVR produces higher quality responses absent in original distribution.

03

Capability does not always improve with distillation, especially without new knowledge.

Abstract

Recent studies have shown that reinforcement learning with verifiable rewards (RLVR) enhances overall accuracy (pass@1) but often fails to improve capability (pass@k) of LLMs in reasoning tasks, while distillation can improve both. In this paper, we investigate the mechanisms behind these phenomena. First, we demonstrate that RLVR struggles to improve capability as it focuses on improving the accuracy of the easier questions to the detriment of the accuracy of the most difficult questions. Second, we show that RLVR does not merely increase the success probability for the easier questions, but in our small model settings, produces quality responses that were absent in its original output distribution. In addition, we show these responses are neither noticeably longer nor feature more reflection-related keywords, underscoring the need for more reliable indicators of response quality.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

minwukim/rlvsdistillation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Software Engineering Research · Modeling, Simulation, and Optimization