AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models

Yiming Wang; Jiahao Chen; Qingming Li; Tong Zhang; Rui Zeng; Xing Yang; Shouling Ji

arXiv:2412.18123·cs.CR·December 10, 2025

AEIOU: A Unified Defense Framework against NSFW Prompts in Text-to-Image Models

Yiming Wang, Jiahao Chen, Qingming Li, Tong Zhang, Rui Zeng, Xing Yang, Shouling Ji

PDF

Open Access

TL;DR

AEIOU is a versatile, efficient, and interpretable framework that enhances safety in text-to-image models by accurately detecting NSFW prompts using hidden state features, outperforming existing moderation tools.

Contribution

This paper introduces AEIOU, a novel unified defense framework that leverages hidden state features for accurate, real-time NSFW prompt detection in T2I models, with broad adaptability and improved efficiency.

Findings

01

Achieves over 95% accuracy across datasets.

02

Improves detection efficiency by at least tenfold.

03

Effectively counters adaptive and multi-label attacks.

Abstract

As text-to-image (T2I) models advance and gain widespread adoption, their associated safety concerns are becoming increasingly critical. Malicious users exploit these models to generate Not-Safe-for-Work (NSFW) images using harmful or adversarial prompts, underscoring the need for effective safeguards to ensure the integrity and compliance of model outputs. However, existing detection methods often exhibit low accuracy and inefficiency. In this paper, we propose AEIOU, a defense framework that is adaptable, efficient, interpretable, optimizable, and unified against NSFW prompts in T2I models. AEIOU extracts NSFW features from the hidden states of the model's text encoder, utilizing the separable nature of these features to detect NSFW prompts. The detection process is efficient, requiring minimal inference time. AEIOU also offers real-time interpretation of results and supports…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning