Boosting Jailbreak Transferability for Large Language Models
Hanqing Liu, Lifeng Zhou, Huanqian Yan

TL;DR
This paper introduces a novel method to improve the transferability of jailbreak attacks on large language models, achieving near-perfect success rates and winning a global challenge.
Contribution
We propose enhancements like scenario induction, optimized suffix selection, and re-suffix mechanisms to significantly boost jailbreak transferability across models.
Findings
Achieved nearly 100% success in attack transferability.
Outperformed existing methods in extensive benchmarks.
Won first place in the AISG global challenge.
Abstract
Large language models have drawn significant attention to the challenge of safe alignment, especially regarding jailbreak attacks that circumvent security measures to produce harmful content. To address the limitations of existing methods like GCG, which perform well in single-model attacks but lack transferability, we propose several enhancements, including a scenario induction template, optimized suffix selection, and the integration of re-suffix attack mechanism to reduce inconsistent outputs. Our approach has shown superior performance in extensive experiments across various benchmarks, achieving nearly 100% success rates in both attack execution and transferability. Notably, our method has won the first place in the AISG-hosted Global Challenge for Safe and Secure LLMs. The code is released at https://github.com/HqingLiu/SI-GCG.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital and Cyber Forensics · Adversarial Robustness in Machine Learning
MethodsSoftmax · Attention Is All You Need
