Reinforcement Learning vs. Distillation: Understanding Accuracy and Capability in LLM Reasoning
Minwu Kim, Anubhav Shrestha, Safal Shrestha, Aadim Nepal, Keith Ross

TL;DR
This paper compares reinforcement learning with verifiable rewards and distillation in large language models, revealing their distinct impacts on accuracy and reasoning capability, and providing insights into their mechanisms and effects.
Contribution
It uncovers how RLVR and distillation differently influence LLM reasoning, highlighting limitations and mechanisms behind their effects on accuracy and capability.
Findings
RLVR improves accuracy but struggles with capability on difficult questions.
RLVR produces higher quality responses absent in original distribution.
Capability does not always improve with distillation, especially without new knowledge.
Abstract
Recent studies have shown that reinforcement learning with verifiable rewards (RLVR) enhances overall accuracy (pass@1) but often fails to improve capability (pass@k) of LLMs in reasoning tasks, while distillation can improve both. In this paper, we investigate the mechanisms behind these phenomena. First, we demonstrate that RLVR struggles to improve capability as it focuses on improving the accuracy of the easier questions to the detriment of the accuracy of the most difficult questions. Second, we show that RLVR does not merely increase the success probability for the easier questions, but in our small model settings, produces quality responses that were absent in its original output distribution. In addition, we show these responses are neither noticeably longer nor feature more reflection-related keywords, underscoring the need for more reliable indicators of response quality.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Law · Software Engineering Research · Modeling, Simulation, and Optimization
