Optimizing Anytime Reasoning via Budget Relative Policy Optimization
Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin

TL;DR
This paper introduces AnytimeReasoner, a framework that improves large language models' reasoning efficiency and flexibility by optimizing token usage and reasoning under variable budgets through a novel RL approach with verifiable rewards.
Contribution
The paper proposes a new RL-based framework, AnytimeReasoner, with a novel variance reduction technique, BRPO, to optimize reasoning performance across varying token budgets.
Findings
Outperforms GRPO across all thinking budgets.
Enhances token and training efficiency.
Demonstrates effectiveness in mathematical reasoning tasks.
Abstract
Scaling test-time compute is crucial for enhancing the reasoning capabilities of large language models (LLMs). Existing approaches typically employ reinforcement learning (RL) to maximize a verifiable reward obtained at the end of reasoning traces. However, such methods optimize only the final performance under a large and fixed token budget, which hinders efficiency in both training and deployment. In this work, we present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance, which aims to improve token efficiency and the flexibility of reasoning under varying token budget constraints. To achieve this, we truncate the complete thinking process to fit within sampled token budgets from a prior distribution, compelling the model to summarize the optimal answer for each truncated thinking for verification. This introduces verifiable dense rewards into the reasoning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
