LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Boshi Wang; Hao Fang; Jason Eisner; Benjamin Van Durme; Yu Su

arXiv:2403.04746·cs.CL·March 8, 2024·1 cites

LLMs in the Imaginarium: Tool Learning through Simulated Trial and Error

Boshi Wang, Hao Fang, Jason Eisner, Benjamin Van Durme, Yu Su

PDF

Open Access 1 Repo

TL;DR

This paper introduces a biologically inspired trial-and-error method called STE that significantly enhances large language models' ability to accurately use tools, outperforming existing models and improving reliability in external environment interactions.

Contribution

The paper proposes STE, a novel trial-and-error approach inspired by biological systems, to improve tool use accuracy in LLMs through imagination, memory, and interaction feedback.

Findings

01

STE improves tool use accuracy by 46.7% on ToolBench.

02

STE enables LLMs to outperform GPT-4 in tool use tasks.

03

Effective continual learning of tools via experience replay.

Abstract

Tools are essential for large language models (LLMs) to acquire up-to-date information and take consequential actions in external environments. Existing work on tool-augmented LLMs primarily focuses on the broad coverage of tools and the flexibility of adding new tools. However, a critical aspect that has surprisingly been understudied is simply how accurately an LLM uses tools for which it has been trained. We find that existing LLMs, including GPT-4 and open-source LLMs specifically fine-tuned for tool use, only reach a correctness rate in the range of 30% to 60%, far from reliable use in practice. We propose a biologically inspired method for tool-augmented LLMs, simulated trial and error (STE), that orchestrates three key mechanisms for successful tool use behaviors in the biological system: trial and error, imagination, and memory. Specifically, STE leverages an LLM's 'imagination'…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

microsoft/simulated-trial-and-error
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Law · Legal Education and Practice Innovations · Law, AI, and Intellectual Property

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Dropout · Softmax · Dense Connections · Label Smoothing · Adam · Absolute Position Encodings · Residual Connection