ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Yifei Zhou; Andrea Zanette; Jiayi Pan; Sergey Levine; Aviral Kumar

arXiv:2402.19446·cs.LG·March 1, 2024·2 cites

ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL

Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

PDF

Open Access 2 Repos 1 Datasets

TL;DR

ArCHer introduces a hierarchical multi-turn RL framework for fine-tuning large language models, significantly enhancing efficiency and performance in goal-directed agent tasks by effectively managing multi-turn interactions and delayed rewards.

Contribution

This paper presents ArCHer, a hierarchical RL framework that enables multi-turn decision-making in LLMs, addressing limitations of single-turn RL methods and improving sample efficiency and scalability.

Findings

01

ArCHer achieves about 100x sample efficiency over existing methods.

02

It improves performance on agent tasks with larger models up to 7 billion parameters.

03

The hierarchical approach effectively manages long-horizon, multi-turn interactions.

Abstract

A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a general paradigm to address such agent tasks, but current RL methods for LLMs largely focus on optimizing single-turn rewards. By construction, most single-turn RL methods cannot endow LLMs with the ability to intelligently seek information over multiple turns, perform credit assignment, or reason about their past actions -- all of which are critical in agent tasks. This raises the question: how can we design effective and efficient multi-turn RL algorithms for LLMs? In this paper, we develop a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Jessie09/OfflineArcher
dataset· 105 dl
105 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsFocus