New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR
Zhilin Wang, Yafu Li, Shunkai Zhang, Zhi Wang, Haoran Zhang, Xiaoye Qu, Yu Cheng

TL;DR
This paper presents a probabilistic framework to explain how reinforcement learning with verifiable rewards (RLVR) enhances reasoning in large language models by sharpening atomic step probabilities, leading to emergent complex capabilities.
Contribution
It introduces a new probabilistic perspective on RLVR, demonstrating that improving atomic step success rates enables models to develop complex reasoning skills from single-step training.
Findings
RLVR amplifies existing skills and explores new solution paths.
Composite success correlates strongly with atomic step probabilities.
RLVR can cause skill trade-offs to maximize overall reward.
Abstract
Whether Reinforcement Learning with Verifiable Rewards (RLVR) endows Large Language Models (LLMs) with new capabilities or merely elicits latent traces remains a central debate. In this work, we align with the former view, proposing a probabilistic framework where capability is defined by instance-level solvability. We hypothesize that the emergence of complex reasoning can be driven by sharpening atomic step probabilities, which enables models to overcome the exponential decay of success rates inherent in multi-step reasoning chains. Utilizing the Algebrarium framework, we train models exclusively on single-step operations and evaluate their performance on unseen multi-step tasks. Our empirical results confirm that: (1) RLVR incentivizes the exploration of previously inaccessible solution paths by amplifying the model's existing skills; (2) composite performance is strictly governed by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications
