The Dark Side of Function Calling: Pathways to Jailbreaking Large   Language Models

Zihui Wu; Haichang Gao; Jianping He; Ping Wang

arXiv:2407.17915·cs.CR·December 25, 2024·2 cites

The Dark Side of Function Calling: Pathways to Jailbreaking Large Language Models

Zihui Wu, Haichang Gao, Jianping He, Ping Wang

PDF

Open Access 1 Repo

TL;DR

This paper reveals a critical security vulnerability in the function calling feature of large language models, demonstrating a high success rate for jailbreak attacks and proposing defensive strategies to mitigate this risk.

Contribution

It introduces a novel jailbreak function attack exploiting alignment issues and safety gaps, and offers practical defense methods for LLM security.

Findings

01

Over 90% success rate of jailbreak attacks on six LLMs

02

Analysis of vulnerabilities in function calling process

03

Proposed defensive prompts to enhance security

Abstract

Large language models (LLMs) have demonstrated remarkable capabilities, but their power comes with significant security considerations. While extensive research has been conducted on the safety of LLMs in chat mode, the security implications of their function calling feature have been largely overlooked. This paper uncovers a critical vulnerability in the function calling process of LLMs, introducing a novel "jailbreak function" attack method that exploits alignment discrepancies, user coercion, and the absence of rigorous safety filters. Our empirical study, conducted on six state-of-the-art LLMs including GPT-4o, Claude-3.5-Sonnet, and Gemini-1.5-pro, reveals an alarming average success rate of over 90\% for this attack. We provide a comprehensive analysis of why function calls are susceptible to such attacks and propose defensive strategies, including the use of defensive prompts.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

wooozihui/jailbreakfunction
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics