CITI: Enhancing Tool Utilizing Ability in Large Language Models without   Sacrificing General Performance

Yupu Hao; Pengfei Cao; Zhuoran Jin; Huanxuan Liao; Yubo Chen; Kang; Liu; Jun Zhao

arXiv:2409.13202·cs.CL·September 24, 2024

CITI: Enhancing Tool Utilizing Ability in Large Language Models without Sacrificing General Performance

Yupu Hao, Pengfei Cao, Zhuoran Jin, Huanxuan Liao, Yubo Chen, Kang, Liu, Jun Zhao

PDF

Open Access 1 Repo

TL;DR

This paper introduces CITI, a method that improves large language models' ability to use external tools without sacrificing their overall performance, by selectively fine-tuning model components based on their importance.

Contribution

The paper proposes a novel component importance-based approach (CITI) that balances tool-utilization and general performance in LLMs through targeted fine-tuning strategies.

Findings

01

CITI enhances tool-utilizing ability effectively.

02

It maintains high general performance of LLMs.

03

Experimental results show superior evaluation metrics.

Abstract

Tool learning enables the Large Language Models (LLMs) to interact with the external environment by invoking tools, enriching the accuracy and capability scope of LLMs. However, previous works predominantly focus on improving model's tool-utilizing accuracy and the ability to generalize to new, unseen tools, excessively forcing LLMs to adjust specific tool-invoking pattern without considering the harm to model's general performance. This deviates from the actual applications and original intention of integrating tools to enhance model. To tackle this problem, we dissect the capability trade-offs by examining the hidden representation changes and the gradient-based importance score of model's components. Based on the analysis result, we propose a Component Importance-based Tool-utilizing ability Injection method (CITI). According to the gradient-based importance score of different…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hypasd-art/CITI
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsFocus