New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR

Zhilin Wang; Yafu Li; Shunkai Zhang; Zhi Wang; Haoran Zhang; Xiaoye Qu; Yu Cheng

arXiv:2602.08281·cs.CL·February 10, 2026

New Skills or Sharper Primitives? A Probabilistic Perspective on the Emergence of Reasoning in RLVR

Zhilin Wang, Yafu Li, Shunkai Zhang, Zhi Wang, Haoran Zhang, Xiaoye Qu, Yu Cheng

PDF

Open Access

TL;DR

This paper presents a probabilistic framework to explain how reinforcement learning with verifiable rewards (RLVR) enhances reasoning in large language models by sharpening atomic step probabilities, leading to emergent complex capabilities.

Contribution

It introduces a new probabilistic perspective on RLVR, demonstrating that improving atomic step success rates enables models to develop complex reasoning skills from single-step training.

Findings

01

RLVR amplifies existing skills and explores new solution paths.

02

Composite success correlates strongly with atomic step probabilities.

03

RLVR can cause skill trade-offs to maximize overall reward.

Abstract

Whether Reinforcement Learning with Verifiable Rewards (RLVR) endows Large Language Models (LLMs) with new capabilities or merely elicits latent traces remains a central debate. In this work, we align with the former view, proposing a probabilistic framework where capability is defined by instance-level solvability. We hypothesize that the emergence of complex reasoning can be driven by sharpening atomic step probabilities, which enables models to overcome the exponential decay of success rates inherent in multi-step reasoning chains. Utilizing the Algebrarium framework, we train models exclusively on single-step operations and evaluate their performance on unseen multi-step tasks. Our empirical results confirm that: (1) RLVR incentivizes the exploration of previously inaccessible solution paths by amplifying the model's existing skills; (2) composite performance is strictly governed by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications