AgentTuning: Enabling Generalized Agent Abilities for LLMs

Aohan Zeng; Mingdao Liu; Rui Lu; Bowen Wang; Xiao Liu; Yuxiao Dong,; Jie Tang

arXiv:2310.12823·cs.CL·October 24, 2023·2 cites

AgentTuning: Enabling Generalized Agent Abilities for LLMs

Aohan Zeng, Mingdao Liu, Rui Lu, Bowen Wang, Xiao Liu, Yuxiao Dong,, Jie Tang

PDF

Open Access 1 Repo 10 Models 5 Datasets 3 Reviews

TL;DR

AgentTuning is a novel instruction-tuning approach that enhances large language models' abilities to perform complex agent tasks while preserving their general language capabilities, bridging the gap with commercial models.

Contribution

The paper introduces AgentTuning, a simple, general instruction-tuning method with a new dataset, improving LLMs' agent abilities without sacrificing their overall performance.

Findings

01

AgentTuning enables LLMs to perform complex agent tasks effectively.

02

AgentLM-70B matches GPT-3.5-turbo on unseen agent tasks.

03

Open-sourced models and datasets promote accessible development.

Abstract

Open large language models (LLMs) with great performance in various tasks have significantly advanced the development of LLMs. However, they are far inferior to commercial models such as ChatGPT and GPT-4 when acting as agents to tackle complex tasks in the real world. These agent tasks employ LLMs as the central controller responsible for planning, memorization, and tool utilization, necessitating both fine-grained prompting methods and robust LLMs to achieve satisfactory performance. Though many prompting methods have been proposed to complete particular agent tasks, there is lack of research focusing on improving the agent capabilities of LLMs themselves without compromising their general abilities. In this work, we present AgentTuning, a simple and general method to enhance the agent abilities of LLMs while maintaining their general LLM capabilities. We construct AgentInstruct, a…

Peer Reviews

Decision·Submitted to ICLR 2024

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

- The motivation to improve the agent ability of open-sourced LLM is good. - It is well-written and the idea it presents is clear. - The evaluation is extensive and the results look promising.

Weaknesses

- Some details of the dataset construction is unclear. - The training strategy used for instruction-tuning is limited. - The rationale behind some design choice needs more explanations.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. Agent tuning is an exciting and important direction to study for the LLMs as intelligent agents. 2. The authors' data/training/model have been well-documented. The results should be reproducible

Weaknesses

1. Figure 1 (b), I don't think the message is fair for this figure, since you trained on AgentBench (although partly), but the other LLMs have not trained on AgentBench. One of the down-sides for open-source LLMs is the ability to generalize to `different' settings from training, but the proposed work has essentially made AgentBench in-distribution by training. 2. It seems that GPT models are heavily relied on for generating training data. Do we have some sense of how to go beyond GPT models? Su

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The paper is written well and easy to follow. It presents a set of expensive experiments, showcasing that open-source LLMs can be competitive with proprietary LLMs when trained on the right data.

Weaknesses

While the empirical contribution is significant, the paper overall feels incremental with straightforward improvements over prior instruction tuning and knowledge distillation. Some of the design decisions are also not explained. 1. While the agent trajectories are very valuable and costly to collect, they are mainly extracted from public tasks/benchmarks by using ReAct with GPT models. The overall process with instruction generation, trajectory collection, and filtering can be useful for other

Code & Models

Repositories

thudm/agenttuning
noneOfficial

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Artificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Attention Is All You Need · Cosine Annealing · Linear Warmup With Cosine Annealing · Linear Layer · Layer Normalization · Attention Dropout · Softmax · Dense Connections