A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning

Akifumi Wachi; Hirota Kinoshita; Shokichi Takakura; Rei Higuchi; Taiji Suzuki

arXiv:2602.01523·cs.LG·February 3, 2026

A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning

Akifumi Wachi, Hirota Kinoshita, Shokichi Takakura, Rei Higuchi, Taiji Suzuki

PDF

Open Access

TL;DR

This paper introduces a relative-budget theory for reinforcement learning in large language model reasoning, explaining how different compute budgets affect learning efficiency and reasoning performance through a unified framework.

Contribution

It proposes a novel relative-budget metric that predicts RL sample efficiency and reasoning success, supported by theoretical analysis and empirical validation.

Findings

01

Optimal learning occurs at a relative budget around 1.5 to 2.0.

02

Three regimes of RL efficiency are identified based on the relative budget.

03

Finite-sample guarantees are provided for online RL across regimes.

Abstract

Reinforcement learning (RL) is a dominant paradigm for improving the reasoning abilities of large language models, yet its effectiveness varies across tasks and compute budgets. We propose a \emph{relative-budget} theory explaining this variation through a single quantity called relative budget $ξ := H / E [T]$ , where $H$ is the generation horizon (token budget) and $T$ denotes the number of tokens until the first correct solution under a base policy. We show that $ξ$ determines sample efficiency by controlling reward variance and the likelihood of informative trajectories. Our analysis reveals three regimes: in the \emph{deficient} regime ( $ξ \to 0$ ), informative trajectories are rare and the sample complexity explodes; in the \emph{balanced} regime ( $ξ = Θ (1)$ ), informative trajectories occur with non-negligible probability and RL is maximally sample-efficient; and in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques