AgentRefine: Enhancing Agent Generalization through Refinement Tuning

Dayuan Fu; Keqing He; Yejie Wang; Wentao Hong; Zhuoma Gongque; Weihao; Zeng; Wei Wang; Jingang Wang; Xunliang Cai; Weiran Xu

arXiv:2501.01702·cs.AI·February 25, 2025

AgentRefine: Enhancing Agent Generalization through Refinement Tuning

Dayuan Fu, Keqing He, Yejie Wang, Wentao Hong, Zhuoma Gongque, Weihao, Zeng, Wei Wang, Jingang Wang, Xunliang Cai, Weiran Xu

PDF

Open Access 3 Reviews

TL;DR

This paper introduces AgentRefine, a novel framework that improves LLM-based agent generalization by enabling models to self-correct mistakes through observation, leading to better performance across diverse tasks and robustness.

Contribution

The paper proposes a new self-refinement tuning method for LLM agents, enhancing their ability to generalize and adapt to new environments beyond manual training data.

Findings

01

AgentRefine outperforms state-of-the-art in generalization across diverse tasks.

02

It demonstrates improved robustness against perturbations.

03

The approach enables diversified reasoning during inference.

Abstract

Large Language Model (LLM) based agents have proved their ability to perform complex tasks like humans. However, there is still a large gap between open-sourced LLMs and commercial models like the GPT series. In this paper, we focus on improving the agent generalization capabilities of LLMs via instruction tuning. We first observe that the existing agent training corpus exhibits satisfactory results on held-in evaluation sets but fails to generalize to held-out sets. These agent-tuning works face severe formatting errors and are frequently stuck in the same mistake for a long while. We analyze that the poor generalization ability comes from overfitting to several manual agent environments and a lack of adaptation to new situations. They struggle with the wrong action steps and can not learn from the experience but just memorize existing observation-action relations. Inspired by the…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 6Confidence 3

Strengths

The proposed method's idea seems like meta learning, which trains the policy on diverse tasks for quickly adapting to novel tasks. This idea makes sense to me and seems new in agent domain. I appreciate authors' rethinking on the generalization of agent-tuning. The issue of memorizing trajectory leading to overfitting seems valid to me. The experiment evaluates the performance of AgentRefine from wide range of perspectives. The findings establish a correlation between agent generalization and

Weaknesses

Overall AgentRefine is a simple and effective method. However, the main idea is not new, as discussed in related work, Agent-FLAN and AgentGen have proposed to train generalist agents using general data. The idea of refinement is also widely studied as discussed in introduction. I encourage authors to clearly differentiate AgentRefine from these prior works. Highlight unique aspects or improvements over existing methods. Consider incorporating a comparative analysis to demonstrate the advantages

Reviewer 02Rating 6Confidence 3

Strengths

1. The paper is well-organized and easy to follow, with a clear progression from motivation to methodology. 2. The identification of the generalization gap in existing LLM-based agents and the proposal of a self-refinement approach to address it is a rational step forward in the field.

Weaknesses

1. The problem of generalization in LLM-based agents has been extensively discussed in previous literature, making the contribution of this work less novel. For example, [1] investigates the robustness of accuracy measurements in large language models (LLMs) when the order of answer labels is shuffled, using the MMLU dataset as a testbed. 2. The methodology, while intuitive, lacks significant innovation, as the approach of enhancing generalization through data synthesis is not new [2]. 3. The

Reviewer 03Rating 6Confidence 3

Strengths

1. This paper discusses the generalization ability of agents, which is a very important topic for the community. 2. The authors provide quantitative analysis to explain their insight, which is very convincing. 3. Synthesizing data with almost no task-specific information is a very practical setting, and the improvement of generalization ability in this paper is impressive.

Weaknesses

1. The presentation of this paper should be improved and some grammar mistakes should be fixed. 2. Some important baselines, for example, Reflexion[1], are missing and should be included. 3. They only consider decision-making tasks in their experiments. However, as they claimed on the generalization ability, tasks of different types should also be included, for example, reasoning tasks. [1] Shinn, Noah, et al. "Reflexion: Language agents with verbal reinforcement learning." NeurIPS, 2023.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMulti-Agent Systems and Negotiation

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Linear Layer · Weight Decay · Multi-Head Attention · Discriminative Fine-Tuning · Layer Normalization · Byte Pair Encoding · Linear Warmup With Cosine Annealing