Trial and Error: Exploration-Based Trajectory Optimization for LLM   Agents

Yifan Song; Da Yin; Xiang Yue; Jie Huang; Sujian Li; Bill Yuchen Lin

arXiv:2403.02502·cs.CL·July 11, 2024·1 cites

Trial and Error: Exploration-Based Trajectory Optimization for LLM Agents

Yifan Song, Da Yin, Xiang Yue, Jie Huang, Sujian Li, Bill Yuchen Lin

PDF

Open Access 2 Repos 1 Datasets

TL;DR

This paper introduces ETO, an exploration-based trajectory optimization method for LLM agents that learns from failures to improve performance through iterative exploration and contrastive training.

Contribution

The study presents a novel approach allowing LLM agents to learn from exploration failures, enhancing performance beyond traditional methods that only use successful trajectories.

Findings

01

ETO outperforms baseline methods on complex tasks

02

Learning from failures improves agent performance

03

Effective in scenarios without expert trajectories

Abstract

Large Language Models (LLMs) have become integral components in various autonomous agent systems. In this study, we present an exploration-based trajectory optimization approach, referred to as ETO. This learning method is designed to enhance the performance of open LLM agents. Contrary to previous studies that exclusively train on successful expert trajectories, our method allows agents to learn from their exploration failures. This leads to improved performance through an iterative optimization framework. During the exploration phase, the agent interacts with the environment while completing given tasks, gathering failure trajectories to create contrastive trajectory pairs. In the subsequent training phase, the agent utilizes these trajectory preference pairs to update its policy using contrastive learning methods like DPO. This iterative cycle of exploration and training fosters…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

agent-eto/eto-sft-trajectory
dataset· 208 dl
208 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms · Simulation Techniques and Applications

MethodsDirect Preference Optimization · Contrastive Learning