ToolSword: Unveiling Safety Issues of Large Language Models in Tool Learning Across Three Stages
Junjie Ye, Sixian Li, Guanyu Li, Caishuang Huang, Songyang Gao, Yilong, Wu, Qi Zhang, Tao Gui, Xuanjing Huang

TL;DR
ToolSword is a comprehensive framework that systematically investigates safety issues in large language models during tool learning, revealing persistent challenges across multiple safety scenarios even in advanced models like GPT-4.
Contribution
The paper introduces ToolSword, a novel framework that identifies and analyzes six safety scenarios in LLM tool learning, filling a critical research gap in safety considerations.
Findings
Safety challenges persist across all stages of tool learning.
Even GPT-4 is vulnerable to safety issues in tool learning.
The framework facilitates targeted research on improving LLM safety.
Abstract
Tool learning is widely acknowledged as a foundational approach or deploying large language models (LLMs) in real-world scenarios. While current research primarily emphasizes leveraging tools to augment LLMs, it frequently neglects emerging safety considerations tied to their application. To fill this gap, we present *ToolSword*, a comprehensive framework dedicated to meticulously investigating safety issues linked to LLMs in tool learning. Specifically, ToolSword delineates six safety scenarios for LLMs in tool learning, encompassing **malicious queries** and **jailbreak attacks** in the input stage, **noisy misdirection** and **risky cues** in the execution stage, and **harmful feedback** and **error conflicts** in the output stage. Experiments conducted on 11 open-source and closed-source LLMs reveal enduring safety challenges in tool learning, such as handling harmful queries,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSafety Systems Engineering in Autonomy · Adversarial Robustness in Machine Learning
MethodsAttention Is All You Need · Linear Layer · Dense Connections · Label Smoothing · Adam · Softmax · Multi-Head Attention · Layer Normalization · Residual Connection · Dropout
