AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a   Dependency Lens

Lin Lu; Hai Yan; Zenghui Yuan; Jiawen Shi; Wenqi Wei; Pin-Yu Chen; Pan; Zhou

arXiv:2406.03805·cs.CR·June 7, 2024

AutoJailbreak: Exploring Jailbreak Attacks and Defenses through a Dependency Lens

Lin Lu, Hai Yan, Zenghui Yuan, Jiawen Shi, Wenqi Wei, Pin-Yu Chen, Pan, Zhou

PDF

Open Access

TL;DR

This paper systematically analyzes jailbreak attacks and defenses in large language models using dependency relationships, proposing automated frameworks that improve attack and defense effectiveness and introduce a new evaluation method.

Contribution

It introduces a dependency-based framework for analyzing and automating jailbreak attacks and defenses, enhancing scalability and effectiveness over existing methods.

Findings

01

Ensemble jailbreak attack outperforms existing methods.

02

AutoDefense improves defense robustness.

03

AutoEvaluation effectively distinguishes hallucinations.

Abstract

Jailbreak attacks in large language models (LLMs) entail inducing the models to generate content that breaches ethical and legal norm through the use of malicious prompts, posing a substantial threat to LLM security. Current strategies for jailbreak attack and defense often focus on optimizing locally within specific algorithmic frameworks, resulting in ineffective optimization and limited scalability. In this paper, we present a systematic analysis of the dependency relationships in jailbreak attack and defense techniques, generalizing them to all possible attack surfaces. We employ directed acyclic graphs (DAGs) to position and analyze existing jailbreak attacks, defenses, and evaluation methodologies, and propose three comprehensive, automated, and logical frameworks. \texttt{AutoAttack} investigates dependencies in two lines of jailbreak optimization strategies: genetic algorithm…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Advanced Malware Detection Techniques · Information and Cyber Security