GuardT2I: Defending Text-to-Image Models from Adversarial Prompts

Yijun Yang; Ruiyuan Gao; Xiao Yang; Jianyuan Zhong; Qiang Xu

arXiv:2403.01446·cs.CV·October 31, 2024·3 cites

GuardT2I: Defending Text-to-Image Models from Adversarial Prompts

Yijun Yang, Ruiyuan Gao, Xiao Yang, Jianyuan Zhong, Qiang Xu

PDF

Open Access 3 Repos

TL;DR

GuardT2I introduces a generative moderation framework using large language models to detect and mitigate adversarial prompts in text-to-image models, significantly improving safety without harming performance.

Contribution

This work presents a novel generative approach with LLMs for adversarial prompt detection in T2I models, surpassing existing moderation solutions.

Findings

01

Outperforms OpenAI-Moderation and Azure Moderator in adversarial scenarios

02

Enhances T2I safety without degrading image generation quality

03

Utilizes LLMs for conditional transformation of text prompts

Abstract

Recent advancements in Text-to-Image (T2I) models have raised significant safety concerns about their potential misuse for generating inappropriate or Not-Safe-For-Work (NSFW) contents, despite existing countermeasures such as NSFW classifiers or model fine-tuning for inappropriate concept removal. Addressing this challenge, our study unveils GuardT2I, a novel moderation framework that adopts a generative approach to enhance T2I models' robustness against adversarial prompts. Instead of making a binary classification, GuardT2I utilizes a Large Language Model (LLM) to conditionally transform text guidance embeddings within the T2I models into natural language for effective adversarial prompt detection, without compromising the models' inherent performance. Our extensive experiments reveal that GuardT2I outperforms leading commercial solutions like OpenAI-Moderation and Microsoft Azure…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning