ToolSword: Unveiling Safety Issues of Large Language Models in Tool   Learning Across Three Stages

Junjie Ye; Sixian Li; Guanyu Li; Caishuang Huang; Songyang Gao; Yilong; Wu; Qi Zhang; Tao Gui; Xuanjing Huang

arXiv:2402.10753·cs.CL·August 19, 2024·2 cites

ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages

Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong, Wu, Qi Zhang, Tao Gui, Xuanjing Huang

PDF

Open Access 1 Repo 1 Video

TL;DR

ToolSword is a comprehensive framework that systematically investigates safety issues in large language models during tool learning, revealing persistent challenges across multiple safety scenarios even in advanced models like GPT-4.

Contribution

The paper introduces ToolSword, a novel framework that identifies and analyzes six safety scenarios in LLM tool learning, filling a critical research gap in safety considerations.

Findings

01

Safety challenges persist across all stages of tool learning.

02

Even GPT-4 is vulnerable to safety issues in tool learning.

03

The framework facilitates targeted research on improving LLM safety.

Abstract

Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios. While current research primarily emphasizes leveraging tools to augment LLMs, it frequently neglects emerging safety considerations tied to their application. To fill this gap, we present *ToolSword*, a comprehensive framework dedicated to meticulously investigating safety issues linked to LLMs in tool learning. Specifically, ToolSword delineates six safety scenarios for LLMs in tool learning, encompassing **malicious queries** and **jailbreak attacks** in the input stage, **noisy misdirection** and **risky cues** in the execution stage, and **harmful feedback** and **error conflicts** in the output stage. Experiments conducted on 11 open-source and closed-source LLMs reveal enduring safety challenges in tool learning, such as handling harmful queries,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

junjie-ye/toolsword
noneOfficial

Videos

ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages· underline

Taxonomy

TopicsSafety Systems Engineering in Autonomy · Adversarial Robustness in Machine Learning

MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Adam · Softmax · Multi-Head Attention · Layer Normalization · Residual Connection · Dropout