Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning

Yuelin Hu; Zhenbo Yu; Zhengxue Cheng; Wei Liu; Li Song

arXiv:2605.05262·stat.ML·May 8, 2026

Maximizing Rollout Informativeness under a Fixed Budget: A Submodular View of Tree Search for Tool-Use Agentic Reinforcement Learning

Yuelin Hu, Zhenbo Yu, Zhengxue Cheng, Wei Liu, Li Song

PDF

TL;DR

This paper introduces InfoTree, a tree-search framework for tool-use agentic reinforcement learning that maximizes rollout informativeness under fixed budgets using submodular optimization, leading to significant performance improvements.

Contribution

It formalizes Rollout Informativeness as a submodular maximization problem and develops a novel framework combining UUCB, ABA, and Speculative Expansion for enhanced efficiency and performance.

Findings

01

ABA improves prompt utilization from 58.1% to 76.3%.

02

Speculative Expansion reduces overhead from 14.3% to 4.8%.

03

InfoTree outperforms several baselines across nine diverse benchmarks.

Abstract

We formalize Rollout Informativeness under a Fixed Budget (RIFB) as the expected non-vanishing policy-gradient mass that a tool-use rollout set injects into Group Relative Policy Optimization (GRPO). We prove that any budget-agnostic independent sampler suffers a collapse rate bounded away from zero for hard prompts regardless of the budget. Motivated by this, we recast intermediate state selection as a monotone submodular maximization problem, where a greedy one-step selector enjoys a 1 minus 1/e approximation guarantee. Our Uncertainty-aware Upper Confidence Bound (UUCB) terms arise as closed-form marginal gains of this objective. This turns the token-level entropy bonus from an empirical trick into an analytic consequence of the formulation. We present InfoTree, a training-time tree-search framework coupling UUCB with a learned Adaptive Budget Allocator (ABA) and an asynchronous…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.