Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling

Yichuan Cao; Yibo Miao; Xiao-Shan Gao; Yinpeng Dong

arXiv:2505.21074·cs.LG·May 28, 2025

Red-Teaming Text-to-Image Systems by Rule-based Preference Modeling

Yichuan Cao, Yibo Miao, Xiao-Shan Gao, Yinpeng Dong

PDF

Open Access

TL;DR

This paper introduces RPG-RT, a novel black-box red-teaming method for text-to-image models that uses rule-based preference modeling and iterative prompt modification to effectively evade unknown safety defenses.

Contribution

It presents a new rule-based preference modeling approach combined with LLM-guided prompt modification for more effective black-box red-teaming of T2I systems.

Findings

01

RPG-RT outperforms existing methods in evading safety defenses.

02

The approach is effective across diverse T2I and T2V models.

03

It demonstrates practicality on commercial API services.

Abstract

Text-to-image (T2I) models raise ethical and safety concerns due to their potential to generate inappropriate or harmful images. Evaluating these models' security through red-teaming is vital, yet white-box approaches are limited by their need for internal access, complicating their use with closed-source models. Moreover, existing black-box methods often assume knowledge about the model's specific defense mechanisms, limiting their utility in real-world commercial API scenarios. A significant challenge is how to evade unknown and diverse defense mechanisms. To overcome this difficulty, we propose a novel Rule-based Preference modeling Guided Red-Teaming (RPG-RT), which iteratively employs LLM to modify prompts to query and leverages feedback from T2I systems for fine-tuning the LLM. RPG-RT treats the feedback from each iteration as a prior, enabling the LLM to dynamically adapt to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsImage Retrieval and Classification Techniques

MethodsSparse Evolutionary Training