Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models

Shiqian Zhao; Chong Wang; Yiming Li; Yihao Huang; Wenjie Qu; Siew-Kei Lam; Yi Xie; Kangjie Chen; Jie Zhang; Tianwei Zhang

arXiv:2508.06837·cs.CR·January 22, 2026

Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models

Shiqian Zhao, Chong Wang, Yiming Li, Yihao Huang, Wenjie Qu, Siew-Kei Lam, Yi Xie, Kangjie Chen, Jie Zhang, Tianwei Zhang

PDF

Open Access

TL;DR

This paper introduces Prometheus, a novel prompt-stealing attack that effectively reconstructs prompts for text-to-image models by interacting with a proxy, using dynamic modifiers and a greedy search, demonstrating high success across multiple platforms.

Contribution

Proposes Prometheus, a training-free, search-based prompt-stealing method with dynamic modifiers and contextual matching, improving attack effectiveness and adaptability over prior fixed-modifier techniques.

Findings

01

Successfully extracts prompts from popular T2I platforms.

02

Achieves 25% improvement in attack success rate.

03

Resistant to common defense mechanisms.

Abstract

Text-to-Image (T2I) models, represented by DALL $\cdot$ E and Midjourney, have gained huge popularity for creating realistic images. The quality of these images relies on the carefully engineered prompts, which have become valuable intellectual property. While skilled prompters showcase their AI-generated art on markets to attract buyers, this business incidentally exposes them to \textit{prompt stealing attacks}. Existing state-of-the-art attack techniques reconstruct the prompts from a fixed set of modifiers (i.e., style descriptions) with model-specific training, which exhibit restricted adaptability and effectiveness to diverse showcases (i.e., target images) and diffusion models. To alleviate these limitations, we propose Prometheus, a training-free, proxy-in-the-loop, search-based prompt-stealing attack, which reverse-engineers the valuable prompts of the showcases by interacting…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection