A Relative-Budget Theory for Reinforcement Learning with Verifiable Rewards in Large Language Model Reasoning
Akifumi Wachi, Hirota Kinoshita, Shokichi Takakura, Rei Higuchi, Taiji Suzuki

TL;DR
This paper introduces a relative-budget theory for reinforcement learning in large language model reasoning, explaining how different compute budgets affect learning efficiency and reasoning performance through a unified framework.
Contribution
It proposes a novel relative-budget metric that predicts RL sample efficiency and reasoning success, supported by theoretical analysis and empirical validation.
Findings
Optimal learning occurs at a relative budget around 1.5 to 2.0.
Three regimes of RL efficiency are identified based on the relative budget.
Finite-sample guarantees are provided for online RL across regimes.
Abstract
Reinforcement learning (RL) is a dominant paradigm for improving the reasoning abilities of large language models, yet its effectiveness varies across tasks and compute budgets. We propose a \emph{relative-budget} theory explaining this variation through a single quantity called relative budget , where is the generation horizon (token budget) and denotes the number of tokens until the first correct solution under a base policy. We show that determines sample efficiency by controlling reward variance and the likelihood of informative trajectories. Our analysis reveals three regimes: in the \emph{deficient} regime (), informative trajectories are rare and the sample complexity explodes; in the \emph{balanced} regime (), informative trajectories occur with non-negligible probability and RL is maximally sample-efficient; and in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Explainable Artificial Intelligence (XAI) · Natural Language Processing Techniques
