Wukong Framework for Not Safe For Work Detection in Text-to-Image systems

Mingrui Liu; Sixiao Zhang; Cheng Long

arXiv:2508.00591·cs.CV·January 21, 2026

Wukong Framework for Not Safe For Work Detection in Text-to-Image systems

Mingrui Liu, Sixiao Zhang, Cheng Long

PDF

Open Access

TL;DR

Wukong is a transformer-based framework integrated into diffusion models for early, efficient, and accurate NSFW detection in text-to-image generation, leveraging intermediate denoising outputs and cross-attention features.

Contribution

The paper introduces Wukong, a novel method that detects NSFW content during the diffusion process by utilizing intermediate outputs and shared attention parameters, improving efficiency and accuracy.

Findings

01

Wukong outperforms text-based safeguards in accuracy.

02

Wukong achieves comparable results to image filters.

03

Wukong enables early NSFW detection during image generation.

Abstract

Text-to-Image (T2I) generation is a popular AI-generated content (AIGC) technology enabling diverse and creative image synthesis. However, some outputs may contain Not Safe For Work (NSFW) content (e.g., violence), violating community guidelines. Detecting NSFW content efficiently and accurately, known as external safeguarding, is essential. Existing external safeguards fall into two types: text filters, which analyze user prompts but overlook T2I model-specific variations and are prone to adversarial attacks; and image filters, which analyze final generated images but are computationally costly and introduce latency. Diffusion models, the foundation of modern T2I systems like Stable Diffusion, generate images through iterative denoising using a U-Net architecture with ResNet and Transformer blocks. We observe that: (1) early denoising steps define the semantic layout of the image, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Generative Adversarial Networks and Image Synthesis · Hate Speech and Cyberbullying Detection