Boosting Jailbreak Transferability for Large Language Models

Hanqing Liu; Lifeng Zhou; Huanqian Yan

arXiv:2410.15645·cs.AI·November 5, 2024

Boosting Jailbreak Transferability for Large Language Models

Hanqing Liu, Lifeng Zhou, Huanqian Yan

PDF

Open Access 1 Repo

TL;DR

This paper introduces a novel method to improve the transferability of jailbreak attacks on large language models, achieving near-perfect success rates and winning a global challenge.

Contribution

We propose enhancements like scenario induction, optimized suffix selection, and re-suffix mechanisms to significantly boost jailbreak transferability across models.

Findings

01

Achieved nearly 100% success in attack transferability.

02

Outperformed existing methods in extensive benchmarks.

03

Won first place in the AISG global challenge.

Abstract

Large language models have drawn significant attention to the challenge of safe alignment, especially regarding jailbreak attacks that circumvent security measures to produce harmful content. To address the limitations of existing methods like GCG, which perform well in single-model attacks but lack transferability, we propose several enhancements, including a scenario induction template, optimized suffix selection, and the integration of re-suffix attack mechanism to reduce inconsistent outputs. Our approach has shown superior performance in extensive experiments across various benchmarks, achieving nearly 100% success rates in both attack execution and transferability. Notably, our method has won the first place in the AISG-hosted Global Challenge for Safe and Secure LLMs. The code is released at https://github.com/HqingLiu/SI-GCG.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

HqingLiu/SI-GCG
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDigital and Cyber Forensics · Adversarial Robustness in Machine Learning

MethodsSoftmax · Attention Is All You Need