Groot: Adversarial Testing for Generative Text-to-Image Models with   Tree-based Semantic Transformation

Yi Liu; Guowei Yang; Gelei Deng; Feiyue Chen; Yuqi Chen; Ling Shi,; Tianwei Zhang; and Yang Liu

arXiv:2402.12100·cs.CL·February 20, 2024·3 cites

Groot: Adversarial Testing for Generative Text-to-Image Models with Tree-based Semantic Transformation

Yi Liu, Guowei Yang, Gelei Deng, Feiyue Chen, Yuqi Chen, Ling Shi,, Tianwei Zhang, and Yang Liu

PDF

Open Access

TL;DR

Groot is an automated framework that uses tree-based semantic transformations and large language models to effectively generate adversarial prompts, significantly improving safety testing of text-to-image models like DALL-E 3 and Midjourney.

Contribution

Groot introduces a novel, automated adversarial testing framework utilizing semantic decomposition and LLMs, outperforming existing methods in safety evaluation of text-to-image models.

Findings

01

Achieves a 93.66% success rate in generating NSFW prompts.

02

Outperforms current state-of-the-art adversarial testing methods.

03

Effectively probes safety vulnerabilities in leading text-to-image models.

Abstract

With the prevalence of text-to-image generative models, their safety becomes a critical concern. adversarial testing techniques have been developed to probe whether such models can be prompted to produce Not-Safe-For-Work (NSFW) content. However, existing solutions face several challenges, including low success rate and inefficiency. We introduce Groot, the first automated framework leveraging tree-based semantic transformation for adversarial testing of text-to-image models. Groot employs semantic decomposition and sensitive element drowning strategies in conjunction with LLMs to systematically refine adversarial prompts. Our comprehensive evaluation confirms the efficacy of Groot, which not only exceeds the performance of current state-of-the-art approaches but also achieves a remarkable success rate (93.66%) on leading text-to-image models such as DALL-E 3 and Midjourney.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Explainable Artificial Intelligence (XAI)