EPO: Hierarchical LLM Agents with Environment Preference Optimization

Qi Zhao; Haotian Fu; Chen Sun; George Konidaris

arXiv:2408.16090·cs.LG·October 7, 2024

EPO: Hierarchical LLM Agents with Environment Preference Optimization

Qi Zhao, Haotian Fu, Chen Sun, George Konidaris

PDF

Open Access 1 Repo

TL;DR

This paper introduces a hierarchical LLM agent framework with Environment Preference Optimization that improves long-horizon decision-making by decomposing tasks, automatically generating reward signals from environment feedback, and achieving state-of-the-art results on ALFRED.

Contribution

The paper presents a novel hierarchical framework with Environment Preference Optimization that leverages environment feedback for reward generation, enhancing LLM agent performance on complex tasks.

Findings

01

Achieved first place on the ALFRED leaderboard.

02

Demonstrated superior performance over existing methods.

03

Validated the effectiveness of environment-based reward signals.

Abstract

Long-horizon decision-making tasks present significant challenges for LLM-based agents due to the need for extensive planning over multiple steps. In this paper, we propose a hierarchical framework that decomposes complex tasks into manageable subgoals, utilizing separate LLMs for subgoal prediction and low-level action generation. To address the challenge of creating training signals for unannotated datasets, we develop a reward model that leverages multimodal environment feedback to automatically generate reward signals. We introduce Environment Preference Optimization (EPO), a novel method that generates preference signals from the environment's feedback and uses them to train LLM-based agents. Extensive experiments on ALFRED demonstrate the state-of-the-art performance of our framework, achieving first place on the ALFRED public leaderboard and showcasing its potential to improve…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kevinz8866/epo
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation · Service-Oriented Architecture and Web Services · Mobile Agent-Based Network Management