Hierarchical Budget Policy Optimization for Adaptive Reasoning

Shangke Lyu; Linjuan Wu; Yuchen Yan; Xingyu Wu; Hao Li; Yongliang Shen; Peisheng Jiang; Weiming Lu; Jun Xiao; Yueting Zhuang

arXiv:2507.15844·cs.AI·August 8, 2025

Hierarchical Budget Policy Optimization for Adaptive Reasoning

Shangke Lyu, Linjuan Wu, Yuchen Yan, Xingyu Wu, Hao Li, Yongliang Shen, Peisheng Jiang, Weiming Lu, Jun Xiao, Yueting Zhuang

PDF

Open Access

TL;DR

This paper introduces HBPO, a reinforcement learning framework that enables large reasoning models to adaptively determine their reasoning depth, significantly improving efficiency and accuracy across reasoning tasks.

Contribution

HBPO is a novel hierarchical training method that allows models to learn problem-specific reasoning depths without sacrificing capability, addressing efficiency and exploration challenges.

Findings

01

Reduces token usage by up to 60.6%.

02

Improves accuracy by 3.14% on reasoning benchmarks.

03

Models exhibit emergent adaptive reasoning behavior.

Abstract

Large reasoning models achieve remarkable performance through extensive chain-of-thought generation, yet they suffer from a critical inefficiency: applying uniformly extensive reasoning regardless of problem complexity. We present Hierarchical Budget Policy Optimization (HBPO), a reinforcement learning framework that enables models to learn problem-specific reasoning depths without sacrificing capability. Unlike existing approaches that impose rigid constraints or rely on discrete mode selection, HBPO partitions the exploration space into budget-constrained hierarchies (512-2560 tokens), each with differentiated reward structures that preserve both efficiency incentives and reasoning capabilities. This design addresses a fundamental challenge in efficient reasoning training: traditional length penalties systematically bias models away from necessary long reasoning paths, causing…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Games