From Novice to Expert: LLM Agent Policy Optimization via Step-wise   Reinforcement Learning

Zhirui Deng; Zhicheng Dou; Yutao Zhu; Ji-Rong Wen; Ruibin Xiong; Mang; Wang; Weipeng Chen

arXiv:2411.03817·cs.AI·December 10, 2024

From Novice to Expert: LLM Agent Policy Optimization via Step-wise Reinforcement Learning

Zhirui Deng, Zhicheng Dou, Yutao Zhu, Ji-Rong Wen, Ruibin Xiong, Mang, Wang, Weipeng Chen

PDF

Open Access

TL;DR

This paper introduces StepAgent, a reinforcement learning approach for LLM agents that uses step-wise rewards and expert comparisons to improve policy learning and performance in complex tasks.

Contribution

The paper proposes a novel step-wise reward mechanism and inverse reinforcement learning techniques to enhance LLM agent training, addressing sparse reward issues.

Findings

01

StepAgent outperforms baseline methods across datasets.

02

Action distribution converges to expert actions over training.

03

Intermediate rewards improve policy learning efficiency.

Abstract

The outstanding capabilities of large language models (LLMs) render them a crucial component in various autonomous agent systems. While traditional methods depend on the inherent knowledge of LLMs without fine-tuning, more recent approaches have shifted toward the reinforcement learning strategy to further enhance agents' ability to solve complex interactive tasks with environments and tools. However, previous approaches are constrained by the sparse reward issue, where existing datasets solely provide a final scalar reward for each multi-step reasoning chain, potentially leading to ineffectiveness and inefficiency in policy learning. In this paper, we introduce StepAgent, which utilizes step-wise reward to optimize the agent's reinforcement learning process. Inheriting the spirit of novice-to-expert theory, we first compare the actions of the expert and the agent to automatically…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation