The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models
Zihui Wu, Haichang Gao, Jianping He, Ping Wang

TL;DR
This paper reveals a critical security vulnerability in the function calling feature of large language models, demonstrating a high success rate for jailbreak attacks and proposing defensive strategies to mitigate this risk.
Contribution
It introduces a novel jailbreak function attack exploiting alignment issues and safety gaps, and offers practical defense methods for LLM security.
Findings
Over 90% success rate of jailbreak attacks on six LLMs
Analysis of vulnerabilities in function calling process
Proposed defensive prompts to enhance security
Abstract
Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security implications of their function calling feature have been largely overlooked. This paper uncovers a critical vulnerability in the function calling process of LLMs, introducing a novel "jailbreak function" attack method that exploits alignment discrepancies, user coercion, and the absence of rigorous safety filters. Our empirical study, conducted on six state-of-the-art LLMs including GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-pro, reveals an alarming average success rate of over 90\% for this attack. We provide a comprehensive analysis of why function calls are susceptible to such attacks and propose defensive strategies, including the use of defensive prompts.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics
