Advancing Tool-Augmented Large Language Models: Integrating Insights   from Errors in Inference Trees

Sijia Chen; Yibo Wang; Yi-Feng Wu; Qing-Guo Chen; Zhao Xu; Weihua Luo,; Kaifu Zhang; Lijun Zhang

arXiv:2406.07115·cs.CL·March 24, 2025·1 cites

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees

Sijia Chen, Yibo Wang, Yi-Feng Wu, Qing-Guo Chen, Zhao Xu, Weihua Luo,, Kaifu Zhang, Lijun Zhang

PDF

Open Access 8 Models 1 Datasets 1 Video

TL;DR

This paper introduces TP-LLaMA, a novel framework that leverages both successful and failed inference trajectories in tool-augmented LLMs to improve reasoning performance and generalization across complex tasks.

Contribution

It proposes a preference learning-based inference trajectory optimization method that utilizes failed paths, enhancing the learning and reasoning capabilities of tool-augmented LLMs.

Findings

01

TP-LLaMA outperforms baselines in most test scenarios.

02

It demonstrates improved generalization to unseen APIs.

03

It achieves higher reasoning efficiency.

Abstract

Tool-augmented large language models (LLMs) leverage tools, often in the form of APIs, to improve their reasoning capabilities on complex tasks. This enables them to act as intelligent agents interacting with the real world. The recently introduced ToolLLaMA model by Qin et al. [2023] utilizes the depth-first search-based decision tree (DFSDT) mechanism for multi-step reasoning with $16000 +$ real-world APIs, effectively enhancing the performance of tool-augmented LLMs compared to traditional chain reasoning mechanisms. However, their approach only employs successful paths from decision trees (also called inference trees) for supervised fine-tuning (SFT), missing out on the potential learning opportunities from failed paths. Inspired by this, we propose an inference trajectory optimization framework based on preference learning to address this limitation. We first introduce a novel…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Datasets

chrissiecsj/ToolPreference
dataset· 99 dl
99 dl

Videos

Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees· slideslive

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling