One Prompt to Verify Your Models: Black-Box Text-to-Image Models Verification via Non-Transferable Adversarial Attacks
Ji Guo, Wenbo Jiang, Rui Zhang, Guoming Lu, Hongwei Li

TL;DR
This paper introduces VerifyPrompt, a method for verifying black-box text-to-image models by generating adversarial prompts that produce distinctive images, achieving over 90% accuracy in model verification tasks.
Contribution
It proposes a novel verification approach using non-transferable adversarial prompts optimized with genetic algorithms, addressing security concerns in third-party T2I services.
Findings
Achieves over 90% verification accuracy
Effective in real-world model platforms like Hugging Face
Utilizes NSGA-II for prompt optimization
Abstract
Recently, various types of Text-to-Image (T2I) models have emerged (such as DALL-E and Stable Diffusion), and showing their advantages in different aspects. Therefore, some third-party service platforms collect different model interfaces and provide cheaper API services and more flexibility in T2I model selections. However, this also raises a new security concern: Are these third-party services truly offering the models they claim? To answer this question, we first define the concept of T2I model verification, which aims to determine whether a black-box target model is identical to a given white-box reference T2I model. After that, we propose VerifyPrompt, which performs T2I model verification through a special designed verify prompt. Intuitionally, the verify prompt is an adversarial prompt for the target model without transferability for other models. It makes the target model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning
Methodstravel james
