Watch Every Step! LLM Agent Learning via Iterative Step-Level Process   Refinement

Weimin Xiong; Yifan Song; Xiutian Zhao; Wenhao Wu; Xun Wang; Ke Wang,; Cheng Li; Wei Peng; Sujian Li

arXiv:2406.11176·cs.CL·September 26, 2024

Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement

Weimin Xiong, Yifan Song, Xiutian Zhao, Wenhao Wu, Xun Wang, Ke Wang,, Cheng Li, Wei Peng, Sujian Li

PDF

Open Access 1 Repo

TL;DR

This paper introduces the Iterative step-level Process Refinement (IPR) framework for training large language model agents, emphasizing detailed step-by-step guidance and contrastive learning to improve performance on complex tasks.

Contribution

The paper presents a novel IPR framework that incorporates step-level rewards and contrastive learning, enhancing agent training beyond outcome-based methods.

Findings

01

IPR outperforms strong baselines on complex tasks

02

Step-level rewards improve action efficiency

03

Framework is applicable to diverse models

Abstract

Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative step-level Process Refinement (IPR) framework, which provides detailed step-by-step guidance to enhance agent training. Specifically, we adopt the Monte Carlo method to estimate step-level rewards. During each iteration, the agent explores along the expert trajectory and generates new actions. These actions are then evaluated against the corresponding step of expert trajectory using step-level rewards. Such comparison helps identify discrepancies, yielding contrastive action pairs that serve as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weiminxiong/ipr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBusiness Process Modeling and Analysis · Artificial Intelligence in Law