ArCHer: Training Language Model Agents via Hierarchical Multi-Turn RL
Yifei Zhou, Andrea Zanette, Jiayi Pan, Sergey Levine, Aviral Kumar

TL;DR
ArCHer introduces a hierarchical multi-turn RL framework for fine-tuning large language models, significantly enhancing efficiency and performance in goal-directed agent tasks by effectively managing multi-turn interactions and delayed rewards.
Contribution
This paper presents ArCHer, a hierarchical RL framework that enables multi-turn decision-making in LLMs, addressing limitations of single-turn RL methods and improving sample efficiency and scalability.
Findings
ArCHer achieves about 100x sample efficiency over existing methods.
It improves performance on agent tasks with larger models up to 7 billion parameters.
The hierarchical approach effectively manages long-horizon, multi-turn interactions.
Abstract
A broad use case of large language models (LLMs) is in goal-directed decision-making tasks (or "agent" tasks), where an LLM needs to not just generate completions for a given prompt, but rather make intelligent decisions over a multi-turn interaction to accomplish a task (e.g., when interacting with the web, using tools, or providing customer support). Reinforcement learning (RL) provides a general paradigm to address such agent tasks, but current RL methods for LLMs largely focus on optimizing single-turn rewards. By construction, most single-turn RL methods cannot endow LLMs with the ability to intelligently seek information over multiple turns, perform credit assignment, or reason about their past actions -- all of which are critical in agent tasks. This raises the question: how can we design effective and efficient multi-turn RL algorithms for LLMs? In this paper, we develop a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsFocus
