Testing and Understanding Erroneous Planning in LLM Agents through   Synthesized User Inputs

Zhenlan Ji; Daoyuan Wu; Pingchuan Ma; Zongjie Li; Shuai Wang

arXiv:2404.17833·cs.AI·April 30, 2024·1 cites

Testing and Understanding Erroneous Planning in LLM Agents through Synthesized User Inputs

Zhenlan Ji, Daoyuan Wu, Pingchuan Ma, Zongjie Li, Shuai Wang

PDF

Open Access

TL;DR

This paper introduces PDoctor, an automated testing framework that detects erroneous planning in LLM-based agents by synthesizing user inputs and formulating constraint satisfaction problems, improving reliability in complex tasks.

Contribution

PDoctor is the first automated approach to test LLM agents' planning by translating user requirements into constraints and detecting errors through constraint solving.

Findings

01

PDoctor effectively detects diverse planning errors in LLM agents.

02

It provides valuable insights into error characteristics for developers.

03

The approach works with multiple agent frameworks and LLMs.

Abstract

Agents based on large language models (LLMs) have demonstrated effectiveness in solving a wide range of tasks by integrating LLMs with key modules such as planning, memory, and tool usage. Increasingly, customers are adopting LLM agents across a variety of commercial applications critical to reliability, including support for mental well-being, chemical synthesis, and software development. Nevertheless, our observations and daily use of LLM agents indicate that they are prone to making erroneous plans, especially when the tasks are complex and require long-term planning. In this paper, we propose PDoctor, a novel and automated approach to testing LLM agents and understanding their erroneous planning. As the first work in this direction, we formulate the detection of erroneous planning as a constraint satisfiability problem: an LLM agent's plan is considered erroneous if its execution…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Logic, Reasoning, and Knowledge · AI-based Problem Solving and Planning