Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation

Yongwoo Kim; Sungmin Cha; Hyunsoo Kim; Jaewon Lee; Donghyun Kim

arXiv:2602.05339·cs.CV·February 6, 2026

Consistency-Preserving Concept Erasure via Unsafe-Safe Pairing and Directional Fisher-weighted Adaptation

Yongwoo Kim, Sungmin Cha, Hyunsoo Kim, Jaewon Lee, Donghyun Kim

PDF

Open Access

TL;DR

This paper introduces PAIR, a novel framework for concept erasure in text-to-image models that preserves semantic consistency by using unsafe-safe concept pairs and Fisher-weighted adaptation.

Contribution

The paper proposes a new consistency-preserving concept erasure method leveraging unsafe-safe pairs and Fisher-weighted initialization, improving over prior approaches.

Findings

01

Outperforms state-of-the-art baselines in concept erasure tasks.

02

Maintains structural and semantic integrity after concept removal.

03

Generates safe alternatives with high fidelity and coherence.

Abstract

With the increasing versatility of text-to-image diffusion models, the ability to selectively erase undesirable concepts (e.g., harmful content) has become indispensable. However, existing concept erasure approaches primarily focus on removing unsafe concepts without providing guidance toward corresponding safe alternatives, which often leads to failure in preserving the structural and semantic consistency between the original and erased generations. In this paper, we propose a novel framework, PAIRed Erasing (PAIR), which reframes concept erasure from simple removal to consistency-preserving semantic realignment using unsafe-safe pairs. We first generate safe counterparts from unsafe inputs while preserving structural and semantic fidelity, forming paired unsafe-safe multimodal data. Leveraging these pairs, we introduce two key components: (1) Paired Semantic Realignment, a guided…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Domain Adaptation and Few-Shot Learning · Machine Learning in Healthcare