Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking
Yihan Chen, Benfeng Xu, Xiaorui Wang, Yongdong Zhang, Zhendong Mao

TL;DR
This paper introduces STeP, a novel training method for LLM-based agents that synthesizes self-reflected trajectories with partial masking, leading to improved performance and reduced data requirements across multiple tasks.
Contribution
The paper presents STeP, a new approach that enhances LLM agent training by generating self-reflected trajectories and applying partial masking, addressing limitations of previous methods.
Findings
Improved agent performance on ALFWorld, WebShop, SciWorld
LLaMA2-7B-Chat trained with STeP outperforms on fewer data
Self-reflected trajectories enhance learning effectiveness
Abstract
Autonomous agents, which perceive environments and take actions to achieve goals, have become increasingly feasible with the advancements in large language models (LLMs). However, current powerful agents often depend on sophisticated prompt engineering combined with closed-source LLMs like GPT-4. Although training open-source LLMs using expert trajectories from teacher models has yielded some improvements in agent capabilities, this approach still faces limitations such as performance plateauing and error propagation. To mitigate these challenges, we propose STeP, a novel method for improving LLM-based agent training. We synthesize self-reflected trajectories that include reflections and corrections of error steps, which enhance the effectiveness of LLM agents in learning from teacher models, enabling them to become agents capable of self-reflecting and correcting. We also introduce…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMulti-Agent Systems and Negotiation · Robotic Path Planning Algorithms
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Layer Normalization · Byte Pair Encoding
