Learning From Failure: Integrating Negative Examples when Fine-tuning   Large Language Models as Agents

Renxi Wang; Haonan Li; Xudong Han; Yixuan Zhang; Timothy Baldwin

arXiv:2402.11651·cs.CL·April 17, 2024·2 cites

Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents

Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper demonstrates that incorporating unsuccessful trajectories as negative examples during fine-tuning significantly improves large language models' performance as agents across various reasoning and question-answering tasks.

Contribution

The study introduces a novel approach of using negative trajectories with quality control in fine-tuning LLMs, showing substantial performance gains and better resource utilization.

Findings

01

Improved performance on mathematical reasoning tasks.

02

Enhanced multi-hop and strategic question answering.

03

Better trade-off between valuable information and errors.

Abstract

Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines. However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents. To resolve this problem, previous work has first collected interaction trajectories between LLMs and environments, using only trajectories that successfully finished the task to fine-tune smaller models, making fine-tuning data scarce and acquiring it both difficult and costly. Discarding failed trajectories also leads to significant wastage of data and resources and limits the possible optimization paths during fine-tuning. In this paper, we argue that unsuccessful trajectories offer valuable insights, and LLMs can learn from these trajectories through appropriate quality control and fine-tuning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

reason-wang/nat
noneOfficial

Models

🤗
Aznaur/qwen3-8b-fix-git-v7-nat-examples
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling

MethodsLinear Layer · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Multi-Head Attention · Layer Normalization · Dropout · Residual Connection