T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model

Chenyu Zhang; Tairen Zhang; Lanjun Wang; Ruidong Chen; Wenhui Li; Anan Liu

arXiv:2510.22300·cs.CR·November 24, 2025

T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model

Chenyu Zhang, Tairen Zhang, Lanjun Wang, Ruidong Chen, Wenhui Li, Anan Liu

PDF

1 Datasets 1 Video

TL;DR

This paper introduces T2I-RiskyPrompt, a comprehensive benchmark with hierarchical risk categories and detailed annotations for evaluating safety in text-to-image models, addressing limitations of existing datasets.

Contribution

It develops a hierarchical risk taxonomy, constructs a large annotated prompt dataset, and proposes a reason-driven detection method for safety evaluation in T2I models.

Findings

01

Identified strengths and limitations of current T2I models' safety

02

Provided insights into defense and attack strategies for T2I safety

03

Established a new benchmark for safety evaluation in T2I models

Abstract

Using risky text prompts, such as pornography and violent prompts, to test the safety of text-to-image (T2I) models is a critical task. However, existing risky prompt datasets are limited in three key areas: 1) limited risky categories, 2) coarse-grained annotation, and 3) low effectiveness. To address these limitations, we introduce T2I-RiskyPrompt, a comprehensive benchmark designed for evaluating safety-related tasks in T2I models. Specifically, we first develop a hierarchical risk taxonomy, which consists of 6 primary categories and 14 fine-grained subcategories. Building upon this taxonomy, we construct a pipeline to collect and annotate risky prompts. Finally, we obtain 6,432 effective risky prompts, where each prompt is annotated with both hierarchical category labels and detailed risk reasons. Moreover, to facilitate the evaluation, we propose a reason-driven risky image…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

datarr/T2I-RiskyPrompt-ImageDataset
dataset· 2.8k dl
2.8k dl

Videos

T2I-RiskyPrompt: A Benchmark for Safety Evaluation, Attack, and Defense on Text-to-Image Model· underline