When Safety Collides: Resolving Multi-Category Harmful Conflicts in Text-to-Image Diffusion via Adaptive Safety Guidance

Yongli Xiang; Ziming Hong; Zhaoqing Wang; Xiangyu Zhao; Bo Han; Tongliang Liu

arXiv:2602.20880·cs.CV·March 24, 2026

When Safety Collides: Resolving Multi-Category Harmful Conflicts in Text-to-Image Diffusion via Adaptive Safety Guidance

Yongli Xiang, Ziming Hong, Zhaoqing Wang, Xiangyu Zhao, Bo Han, Tongliang Liu

PDF

Open Access

TL;DR

This paper introduces CASG, a training-free framework that dynamically resolves conflicts among multiple harmful content categories in text-to-image diffusion models, significantly reducing unsafe outputs.

Contribution

The paper proposes a novel conflict-aware safety guidance method that adaptively identifies and mitigates multi-category harmful content during image generation.

Findings

01

Reduces harmful content rate by up to 15.4%

02

Outperforms existing safety mitigation methods

03

Applicable to both latent-space and text-space safeguards

Abstract

Text-to-Image (T2I) diffusion models have demonstrated significant advancements in generating high-quality images, while raising potential safety concerns regarding harmful content generation. Safety-guidance-based methods have been proposed to mitigate harmful outputs by steering generation away from harmful zones, where the zones are averaged across multiple harmful categories based on predefined keywords. However, these approaches fail to capture the complex interplay among different harm categories, leading to "harmful conflicts" where mitigating one type of harm may inadvertently amplify another, thus increasing overall harmful rate. To address this issue, we propose Conflict-aware Adaptive Safety Guidance (CASG), a training-free framework that dynamically identifies and applies the category-aligned safety direction during generation. CASG is composed of two components: (i)…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Image Enhancement Techniques