Learning From Failure: Integrating Negative Examples when Fine-tuning Large Language Models as Agents
Renxi Wang, Haonan Li, Xudong Han, Yixuan Zhang, Timothy Baldwin

TL;DR
This paper demonstrates that incorporating unsuccessful trajectories as negative examples during fine-tuning significantly improves large language models' performance as agents across various reasoning and question-answering tasks.
Contribution
The study introduces a novel approach of using negative trajectories with quality control in fine-tuning LLMs, showing substantial performance gains and better resource utilization.
Findings
Improved performance on mathematical reasoning tasks.
Enhanced multi-hop and strategic question answering.
Better trade-off between valuable information and errors.
Abstract
Large language models (LLMs) have achieved success in acting as agents, which interact with environments through tools such as search engines. However, LLMs are optimized for language generation instead of tool use during training or alignment, limiting their effectiveness as agents. To resolve this problem, previous work has first collected interaction trajectories between LLMs and environments, using only trajectories that successfully finished the task to fine-tune smaller models, making fine-tuning data scarce and acquiring it both difficult and costly. Discarding failed trajectories also leads to significant wastage of data and resources and limits the possible optimization paths during fine-tuning. In this paper, we argue that unsuccessful trajectories offer valuable insights, and LLMs can learn from these trajectories through appropriate quality control and fine-tuning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling
MethodsLinear Layer · Dense Connections · Label Smoothing · Adam · Attention Is All You Need · Softmax · Multi-Head Attention · Layer Normalization · Dropout · Residual Connection
