Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Penghui Qi; Zichen Liu; Tianyu Pang; Chao Du; Wee Sun Lee; Min Lin

arXiv:2505.13438·cs.LG·November 10, 2025

Optimizing Anytime Reasoning via Budget Relative Policy Optimization

Penghui Qi, Zichen Liu, Tianyu Pang, Chao Du, Wee Sun Lee, Min Lin

PDF

Open Access 1 Repo

TL;DR

This paper introduces AnytimeReasoner, a framework that improves large language models' reasoning efficiency and flexibility by optimizing token usage and reasoning under variable budgets through a novel RL approach with verifiable rewards.

Contribution

The paper proposes a new RL-based framework, AnytimeReasoner, with a novel variance reduction technique, BRPO, to optimize reasoning performance across varying token budgets.

Findings

01

Outperforms GRPO across all thinking budgets.

02

Enhances token and training efficiency.

03

Demonstrates effectiveness in mathematical reasoning tasks.

Abstract

Scaling test-time compute is crucial for enhancing the reasoning capabilities of large language models (LLMs). Existing approaches typically employ reinforcement learning (RL) to maximize a verifiable reward obtained at the end of reasoning traces. However, such methods optimize only the final performance under a large and fixed token budget, which hinders efficiency in both training and deployment. In this work, we present a novel framework, AnytimeReasoner, to optimize anytime reasoning performance, which aims to improve token efficiency and the flexibility of reasoning under varying token budget constraints. To achieve this, we truncate the complete thinking process to fit within sampled token budgets from a prior distribution, compelling the model to summarize the optimal answer for each truncated thinking for verification. This introduces verifiable dense rewards into the reasoning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sail-sg/anytimereasoner
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications