GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Advik Raj Basani; Xiao Zhang

arXiv:2411.14133·cs.LG·November 7, 2025

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs

Advik Raj Basani, Xiao Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

GASP is a novel black-box framework that efficiently generates human-readable adversarial prompts to jailbreak large language models, improving success rates while reducing computational costs.

Contribution

It introduces a Bayesian optimization-based method for generating natural adversarial suffixes in a fully black-box setting, enhancing scalability and effectiveness.

Findings

01

GASP significantly outperforms baseline methods in jailbreak success rate.

02

It reduces training time and accelerates inference for adversarial prompt generation.

03

GASP produces more natural and human-readable prompts compared to previous approaches.

Abstract

LLMs have shown impressive capabilities across various natural language processing tasks, yet remain vulnerable to input prompts, known as jailbreak attacks, carefully designed to bypass safety guardrails and elicit harmful responses. Traditional methods rely on manual heuristics but suffer from limited generalizability. Despite being automatic, optimization-based attacks often produce unnatural prompts that can be easily detected by safety filters or require high computational costs due to discrete token optimization. In this paper, we introduce Generative Adversarial Suffix Prompter (GASP), a novel automated framework that can efficiently generate human-readable jailbreak prompts in a fully black-box setting. In particular, GASP leverages latent Bayesian optimization to craft adversarial suffixes by efficiently exploring continuous latent embedding spaces, gradually optimizing the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

TrustMLRG/GASP
pytorchOfficial

Videos

GASP: Efficient Black-Box Generation of Adversarial Suffixes for Jailbreaking LLMs· slideslive

Taxonomy

TopicsHandwritten Text Recognition Techniques · Vehicle License Plate Recognition · Digital Media Forensic Detection