EasyJailbreak: A Unified Framework for Jailbreaking Large Language Models
Weikang Zhou, Xiao Wang, Limao Xiong, Han Xia, Yingshuang Gu, Mingxu, Chai, Fukang Zhu, Caishuang Huang, Shihan Dou, Zhiheng Xi, Rui Zheng,, Songyang Gao, Yicheng Zou, Hang Yan, Yifan Le, Ruohui Wang, Lijun Li, Jing, Shao, Tao Gui, Qi Zhang, Xuanjing Huang

TL;DR
EasyJailbreak is a modular, unified framework that simplifies constructing and evaluating jailbreak attacks on large language models, revealing significant vulnerabilities across multiple models and supporting comprehensive security assessments.
Contribution
The paper introduces EasyJailbreak, a modular framework supporting 11 jailbreak methods, enabling standardized, efficient security evaluation of diverse large language models.
Findings
Average breach probability of 60% across tested LLMs
GPT-3.5-Turbo and GPT-4 have attack success rates of 57% and 33%
EasyJailbreak supports broad security validation and resource sharing
Abstract
Jailbreak attacks are crucial for identifying and mitigating the security vulnerabilities of Large Language Models (LLMs). They are designed to bypass safeguards and elicit prohibited outputs. However, due to significant differences among various jailbreak methods, there is no standard implementation framework available for the community, which limits comprehensive security evaluations. This paper introduces EasyJailbreak, a unified framework simplifying the construction and evaluation of jailbreak attacks against LLMs. It builds jailbreak attacks using four components: Selector, Mutator, Constraint, and Evaluator. This modular framework enables researchers to easily construct attacks from combinations of novel and existing components. So far, EasyJailbreak supports 11 distinct jailbreak methods and facilitates the security validation of a broad spectrum of LLMs. Our validation across…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Absolute Position Encodings · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Transformer · Softmax
