Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Zhao Xu; Fan Liu; Hao Liu

arXiv:2406.09324·cs.CR·November 7, 2024

Bag of Tricks: Benchmarking of Jailbreak Attacks on LLMs

Zhao Xu, Fan Liu, Hao Liu

PDF

Open Access 2 Repos

TL;DR

This paper introduces JailTrickBench, a comprehensive benchmarking framework for evaluating jailbreak attacks on LLMs, emphasizing the importance of standardized assessment across various attack settings and defense methods.

Contribution

The paper presents JailTrickBench, a new benchmark for systematically evaluating jailbreak attacks on LLMs, including diverse attack factors and defense scenarios, with extensive experimental validation.

Findings

01

Standardized benchmarking reveals vulnerabilities in defense-enhanced LLMs.

02

Evaluation of eight key attack factors across multiple datasets and defenses.

03

Approximately 354 experiments demonstrate the framework's effectiveness.

Abstract

Although Large Language Models (LLMs) have demonstrated significant capabilities in executing complex tasks in a zero-shot manner, they are susceptible to jailbreak attacks and can be manipulated to produce harmful outputs. Recently, a growing body of research has categorized jailbreak attacks into token-level and prompt-level attacks. However, previous work primarily overlooks the diverse key factors of jailbreak attacks, with most studies concentrating on LLM vulnerabilities and lacking exploration of defense-enhanced LLMs. To address these issues, we introduced $JailTrickBench$ to evaluate the impact of various attack settings on LLM performance and provide a baseline for jailbreak attacks, encouraging the adoption of a standardized evaluation framework. Specifically, we evaluate the eight key factors of implementing jailbreak attacks on LLMs from both target-level and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Cybercrime and Law Enforcement Studies