AmpleGCG: Learning a Universal and Transferable Generative Model of Adversarial Suffixes for Jailbreaking Both Open and Closed LLMs
Zeyi Liao, Huan Sun

TL;DR
This paper introduces AmpleGCG, a generative model that efficiently produces adversarial suffixes to jailbreak both open and closed large language models, significantly improving attack success rates and transferability.
Contribution
AmpleGCG is a novel generative approach that learns the distribution of adversarial suffixes, enabling rapid, universal, and transferable attacks on various LLMs, surpassing existing methods.
Findings
Achieves near 100% attack success on Llama-2-7B-chat and Vicuna-7B.
Transfers seamlessly to attack GPT-3.5 with 99% success.
Generates 200 suffixes in 4 seconds, increasing attack efficiency.
Abstract
As large language models (LLMs) become increasingly prevalent and integrated into autonomous systems, ensuring their safety is imperative. Despite significant strides toward safety alignment, recent work GCG~\citep{zou2023universal} proposes a discrete token optimization algorithm and selects the single suffix with the lowest loss to successfully jailbreak aligned LLMs. In this work, we first discuss the drawbacks of solely picking the suffix with the lowest loss during GCG optimization for jailbreaking and uncover the missed successful suffixes during the intermediate steps. Moreover, we utilize those successful suffixes as training data to learn a generative model, named AmpleGCG, which captures the distribution of adversarial suffixes given a harmful query and enables the rapid generation of hundreds of suffixes for any harmful queries in seconds. AmpleGCG achieves near 100\% attack…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗osunlp/AmpleGCG-llama2-sourced-llama2-7b-chatmodel· 57 dl· ♡ 457 dl♡ 4
- 🤗osunlp/AmpleGCG-llama2-sourced-vicuna-7bmodel· ♡ 1♡ 1
- 🤗osunlp/AmpleGCG-llama2-sourced-vicuna-7b13b-guanaco-7b13bmodel· 20 dl· ♡ 120 dl♡ 1
- 🤗osunlp/AmpleGCG-plus-llama2-sourced-llama2-7b-chatmodel· 187 dl· ♡ 2187 dl♡ 2
- 🤗osunlp/AmpleGCG-plus-llama2-sourced-vicuna-7b13b-guanaco-7b13bmodel· 2 dl2 dl
- 🤗RichardErkhov/osunlp_-_AmpleGCG-llama2-sourced-llama2-7b-chat-ggufmodel· 11 dl11 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Cosine Annealing · Linear Layer · Layer Normalization · Dense Connections · Attention Dropout · Residual Connection · Linear Warmup With Cosine Annealing
