Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking

Yihan Chen; Benfeng Xu; Xiaorui Wang; Yongdong Zhang; Zhendong Mao

arXiv:2505.20023·cs.CL·May 27, 2025

Training LLM-Based Agents with Synthetic Self-Reflected Trajectories and Partial Masking

Yihan Chen, Benfeng Xu, Xiaorui Wang, Yongdong Zhang, Zhendong Mao

PDF

Open Access

TL;DR

This paper introduces STeP, a novel training method for LLM-based agents that synthesizes self-reflected trajectories with partial masking, leading to improved performance and reduced data requirements across multiple tasks.

Contribution

The paper presents STeP, a new approach that enhances LLM agent training by generating self-reflected trajectories and applying partial masking, addressing limitations of previous methods.

Findings

01

Improved agent performance on ALFWorld, WebShop, SciWorld

02

LLaMA2-7B-Chat trained with STeP outperforms on fewer data

03

Self-reflected trajectories enhance learning effectiveness

Abstract

Autonomous agents, which perceive environments and take actions to achieve goals, have become increasingly feasible with the advancements in large language models (LLMs). However, current powerful agents often depend on sophisticated prompt engineering combined with closed-source LLMs like GPT-4. Although training open-source LLMs using expert trajectories from teacher models has yielded some improvements in agent capabilities, this approach still faces limitations such as performance plateauing and error propagation. To mitigate these challenges, we propose STeP, a novel method for improving LLM-based agent training. We synthesize self-reflected trajectories that include reflections and corrections of error steps, which enhance the effectiveness of LLM agents in learning from teacher models, enabling them to become agents capable of self-reflecting and correcting. We also introduce…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Robotic Path Planning Algorithms

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing · Multi-Head Attention · Layer Normalization · Byte Pair Encoding