Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching

Rongzhe Wei; Ge Shi; Min Cheng; Na Zhang; Pan Li; Sarthak Ghosh; Vaibhav Gorde; and Leman Akoglu

arXiv:2604.12126·cs.AI·April 15, 2026

Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching

Rongzhe Wei, Ge Shi, Min Cheng, Na Zhang, Pan Li, Sarthak Ghosh, Vaibhav Gorde, and Leman Akoglu

PDF

TL;DR

This paper introduces a new benchmark for evaluating large language model agents in tool-rich environments and proposes an entropy-guided search method to improve multi-step task execution efficiency.

Contribution

The paper presents SLATE, a large-scale benchmark for tool-augmented agents, and proposes Entropy-Guided Branching, a novel search algorithm that enhances exploration and efficiency.

Findings

01

Current agents struggle with self-correction and search efficiency on SLATE.

02

EGB significantly improves task success rates.

03

EGB reduces computational costs during planning.

Abstract

Large Language Models (LLMs) have significantly advanced tool-augmented agents, enabling autonomous reasoning via API interactions. However, executing multi-step tasks within massive tool libraries remains challenging due to two critical bottlenecks: (1) the absence of rigorous, plan-level evaluation frameworks and (2) the computational demand of exploring vast decision spaces stemming from large toolsets and long-horizon planning. To bridge these gaps, we first introduce SLATE (Synthetic Large-scale API Toolkit for E-commerce), a large-scale context-aware benchmark designed for the automated assessment of tool-integrated agents. Unlike static metrics, SLATE accommodates diverse yet functionally valid execution trajectories, revealing that current agents struggle with self-correction and search efficiency. Motivated by these findings, we next propose Entropy-Guided Branching (EGB), an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.