SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing

Hongguang Zhu; Yunchao Wei; Mengyu Wang; Siyu Jiao; Yan Fang; Jiannan Huang; Yao Zhao

arXiv:2506.09363·cs.CV·June 12, 2025

SAGE: Exploring the Boundaries of Unsafe Concept Domain with Semantic-Augment Erasing

Hongguang Zhu, Yunchao Wei, Mengyu Wang, Siyu Jiao, Yan Fang, Jiannan Huang, Yao Zhao

PDF

Open Access 1 Repo

TL;DR

SAGE introduces a semantic-augment erasing method for diffusion models to unlearn unsafe concepts more effectively, enhancing safety in text-to-image generation without additional data, and balances concept erasure with retention.

Contribution

The paper proposes a novel semantic-augment erasing technique and a global-local retention mechanism to improve unsafe concept removal in diffusion models.

Findings

01

SAGE outperforms existing methods in safe content generation.

02

It effectively unlearns unsafe concepts without degrading irrelevant concepts.

03

The approach does not require additional preprocessed data.

Abstract

Diffusion models (DMs) have achieved significant progress in text-to-image generation. However, the inevitable inclusion of sensitive information during pre-training poses safety risks, such as unsafe content generation and copyright infringement. Concept erasing finetunes weights to unlearn undesirable concepts, and has emerged as a promising solution. However, existing methods treat unsafe concept as a fixed word and repeatedly erase it, trapping DMs in ``word concept abyss'', which prevents generalized concept-related erasing. To escape this abyss, we introduce semantic-augment erasing which transforms concept word erasure into concept domain erasure by the cyclic self-check and self-erasure. It efficiently explores and unlearns the boundary representation of concept domain through semantic spatial relationships between original and training DMs, without requiring additional…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kevinlight831/sage
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Topic Modeling