Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement
Weimin Xiong, Yifan Song, Xiutian Zhao, Wenhao Wu, Xun Wang, Ke Wang,, Cheng Li, Wei Peng, Sujian Li

TL;DR
This paper introduces the Iterative step-level Process Refinement (IPR) framework for training large language model agents, emphasizing detailed step-by-step guidance and contrastive learning to improve performance on complex tasks.
Contribution
The paper presents a novel IPR framework that incorporates step-level rewards and contrastive learning, enhancing agent training beyond outcome-based methods.
Findings
IPR outperforms strong baselines on complex tasks
Step-level rewards improve action efficiency
Framework is applicable to diverse models
Abstract
Large language model agents have exhibited exceptional performance across a range of complex interactive tasks. Recent approaches have utilized tuning with expert trajectories to enhance agent performance, yet they primarily concentrate on outcome rewards, which may lead to errors or suboptimal actions due to the absence of process supervision signals. In this paper, we introduce the Iterative step-level Process Refinement (IPR) framework, which provides detailed step-by-step guidance to enhance agent training. Specifically, we adopt the Monte Carlo method to estimate step-level rewards. During each iteration, the agent explores along the expert trajectory and generates new actions. These actions are then evaluated against the corresponding step of expert trajectory using step-level rewards. Such comparison helps identify discrepancies, yielding contrastive action pairs that serve as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBusiness Process Modeling and Analysis · Artificial Intelligence in Law
