FlowBench: Revisiting and Benchmarking Workflow-Guided Planning for LLM-based Agents
Ruixuan Xiao, Wentao Ma, Ke Wang, Yuchuan Wu, Junbo Zhao, Haobo Wang,, Fei Huang, Yongbin Li

TL;DR
FlowBench introduces a comprehensive benchmark for evaluating workflow-guided planning in LLM-based agents, highlighting the need for improved planning reliability across diverse knowledge formats and scenarios.
Contribution
This paper formalizes various workflow knowledge formats and presents FlowBench, the first benchmark for assessing workflow-guided planning in LLM agents across multiple domains.
Findings
Current LLM agents require significant improvements for effective planning.
Workflow knowledge in diverse formats impacts planning performance.
FlowBench provides a challenging environment for future research.
Abstract
LLM-based agents have emerged as promising tools, which are crafted to fulfill complex tasks by iterative planning and action. However, these agents are susceptible to undesired planning hallucinations when lacking specific knowledge for expertise-intensive tasks. To address this, preliminary attempts are made to enhance planning reliability by incorporating external workflow-related knowledge. Despite the promise, such infused knowledge is mostly disorganized and diverse in formats, lacking rigorous formalization and comprehensive comparisons. Motivated by this, we formalize different formats of workflow knowledge and present FlowBench, the first benchmark for workflow-guided planning. FlowBench covers 51 different scenarios from 6 domains, with knowledge presented in diverse formats. To assess different LLMs on FlowBench, we design a multi-tiered evaluation framework. We evaluate the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSemantic Web and Ontologies · Multi-Agent Systems and Negotiation · AI-based Problem Solving and Planning
