AppBench: Planning of Multiple APIs from Various APPs for Complex User Instruction
Hongru Wang, Rui Wang, Boyang Xue, Heming Xia, Jingtao Cao, Zeming, Liu, Jeff Z. Pan, Kam-Fai Wong

TL;DR
AppBench is a new benchmark designed to evaluate large language models' ability to plan and execute multiple APIs from various sources for complex user instructions, highlighting current limitations of state-of-the-art models.
Contribution
This paper introduces the first benchmark for multi-API planning from diverse sources, addressing complex dependencies and permission constraints in real-world scenarios.
Findings
GPT-4o achieves only 2.0% success on complex tasks
Existing LLMs struggle with multi-API planning and execution
Benchmark provides a new standard for evaluating multi-API capabilities
Abstract
Large Language Models (LLMs) can interact with the real world by connecting with versatile external APIs, resulting in better problem-solving and task automation capabilities. Previous research primarily focuses on APIs with limited arguments from a single source or overlooks the complex dependency relationship between different APIs. However, it is essential to utilize multiple APIs collaboratively from various sources (e.g., different Apps in the iPhone), especially for complex user instructions. In this paper, we introduce \texttt{AppBench}, the first benchmark to evaluate LLMs' ability to plan and execute multiple APIs from various sources in order to complete the user's task. Specifically, we consider two significant challenges in multiple APIs: \textit{1) graph structures:} some APIs can be executed independently while others need to be executed one by one, resulting in graph-like…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsWeb Applications and Data Management · Service-Oriented Architecture and Web Services · Mobile and Web Applications
