PLA: Prompt Learning Attack against Text-to-Image Generative Models

Xinqi Lyu; Yihao Liu; Yanjie Li; and Bin Xiao

arXiv:2508.03696·cs.CR·August 7, 2025

PLA: Prompt Learning Attack against Text-to-Image Generative Models

Xinqi Lyu, Yihao Liu, Yanjie Li, and Bin Xiao

PDF

TL;DR

This paper introduces PLA, a novel prompt learning attack framework that effectively bypasses safety mechanisms in black-box text-to-image models using gradient-based training with multimodal similarities.

Contribution

The paper presents a new prompt learning attack method tailored for black-box T2I models, overcoming limitations of previous word substitution approaches.

Findings

01

PLA achieves higher attack success rates than existing methods.

02

The framework effectively bypasses prompt filters and safety checkers.

03

Gradient-based training with multimodal similarities enhances attack performance.

Abstract

Text-to-Image (T2I) models have gained widespread adoption across various applications. Despite the success, the potential misuse of T2I models poses significant risks of generating Not-Safe-For-Work (NSFW) content. To investigate the vulnerability of T2I models, this paper delves into adversarial attacks to bypass the safety mechanisms under black-box settings. Most previous methods rely on word substitution to search adversarial prompts. Due to limited search space, this leads to suboptimal performance compared to gradient-based training. However, black-box settings present unique challenges to training gradient-driven attack methods, since there is no access to the internal architecture and parameters of T2I models. To facilitate the learning of adversarial prompts in black-box settings, we propose a novel prompt learning attack framework (PLA), where insightful gradient-based…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.