Boosting Jailbreak Attack with Momentum

Yihao Zhang; Zeming Wei

arXiv:2405.01229·cs.LG·March 4, 2025

Boosting Jailbreak Attack with Momentum

Yihao Zhang, Zeming Wei

PDF

Open Access 1 Repo

TL;DR

This paper introduces MAC, a momentum-accelerated attack method that improves the efficiency and success rate of jailbreak attacks on large language models by stabilizing and enhancing the adversarial prompt generation process.

Contribution

The paper proposes a novel momentum-based optimization technique for adversarial prompt generation, significantly improving attack success and efficiency against LLMs.

Findings

01

MAC outperforms baseline attacks in success rate

02

MAC enhances optimization efficiency

03

MAC remains effective under defense mechanisms

Abstract

Large Language Models (LLMs) have achieved remarkable success across diverse tasks, yet they remain vulnerable to adversarial attacks, notably the well-known jailbreak attack. In particular, the Greedy Coordinate Gradient (GCG) attack has demonstrated efficacy in exploiting this vulnerability by optimizing adversarial prompts through a combination of gradient heuristics and greedy search. However, the efficiency of this attack has become a bottleneck in the attacking process. To mitigate this limitation, in this paper we rethink the generation of the adversarial prompts through an optimization lens, aiming to stabilize the optimization process and harness more heuristic insights from previous optimization iterations. Specifically, we propose the \textbf{M}omentum \textbf{A}ccelerated G\textbf{C}G (\textbf{MAC}) attack, which integrates a momentum term into the gradient heuristic to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

weizeming/momentum-attack-llm
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Malware Detection Techniques · Network Security and Intrusion Detection · Adversarial Robustness in Machine Learning

MethodsRandom Search