PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Zhiwei Liu; Weiran Yao; Jianguo Zhang; Rithesh Murthy; Liangwei Yang,; Zuxin Liu; Tian Lan; Ming Zhu; Juntao Tan; Shirley Kokane; Thai Hoang; Juan; Carlos Niebles; Shelby Heinecke; Huan Wang; Silvio Savarese; Caiming Xiong

arXiv:2410.18528·cs.AI·October 25, 2024

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent

Zhiwei Liu, Weiran Yao, Jianguo Zhang, Rithesh Murthy, Liangwei Yang,, Zuxin Liu, Tian Lan, Ming Zhu, Juntao Tan, Shirley Kokane, Thai Hoang, Juan, Carlos Niebles, Shelby Heinecke, Huan Wang, Silvio Savarese, Caiming Xiong

PDF

Open Access 1 Video

TL;DR

This paper presents PRAct, a framework that enables large language model agents to learn, adapt, and refine action principles through reflection and optimization, improving their performance across various environments.

Contribution

The paper introduces the RPO framework with reward-based and self-reflective methods, advancing how LLM agents learn and adapt action principles from trajectory data.

Findings

01

PRAct improves agent performance in multiple environments.

02

RPO effectively refines action principles through reflection.

03

Self-RPO enables learning without external rewards.

Abstract

We introduce the Principled Reasoning and Acting (PRAct) framework, a novel method for learning and enforcing action principles from trajectory data. Central to our approach is the use of text gradients from a reflection and optimization engine to derive these action principles. To adapt action principles to specific task requirements, we propose a new optimization framework, Reflective Principle Optimization (RPO). After execution, RPO employs a reflector to critique current action principles and an optimizer to update them accordingly. We develop the RPO framework under two scenarios: Reward-RPO, which uses environmental rewards for reflection, and Self-RPO, which conducts self-reflection without external rewards. Additionally, two RPO methods, RPO-Traj and RPO-Batch, is introduced to adapt to different settings. Experimental results across four environments demonstrate that the PRAct…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

PRACT: Optimizing Principled Reasoning and Acting of LLM Agent· underline

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Semantic Web and Ontologies