Long-Horizon Plan Execution in Large Tool Spaces through Entropy-Guided Branching
Rongzhe Wei, Ge Shi, Min Cheng, Na Zhang, Pan Li, Sarthak Ghosh, Vaibhav Gorde, and Leman Akoglu

TL;DR
This paper introduces a new benchmark for evaluating large language model agents in tool-rich environments and proposes an entropy-guided search method to improve multi-step task execution efficiency.
Contribution
The paper presents SLATE, a large-scale benchmark for tool-augmented agents, and proposes Entropy-Guided Branching, a novel search algorithm that enhances exploration and efficiency.
Findings
Current agents struggle with self-correction and search efficiency on SLATE.
EGB significantly improves task success rates.
EGB reduces computational costs during planning.
Abstract
Large Language Models (LLMs) have significantly advanced tool-augmented agents, enabling autonomous reasoning via API interactions. However, executing multi-step tasks within massive tool libraries remains challenging due to two critical bottlenecks: (1) the absence of rigorous, plan-level evaluation frameworks and (2) the computational demand of exploring vast decision spaces stemming from large toolsets and long-horizon planning. To bridge these gaps, we first introduce SLATE (Synthetic Large-scale API Toolkit for E-commerce), a large-scale context-aware benchmark designed for the automated assessment of tool-integrated agents. Unlike static metrics, SLATE accommodates diverse yet functionally valid execution trajectories, revealing that current agents struggle with self-correction and search efficiency. Motivated by these findings, we next propose Entropy-Guided Branching (EGB), an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
