Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models

Hao Chen; Yiwei Wang; Songze Li

arXiv:2512.13039·cs.CV·December 17, 2025

Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models

Hao Chen, Yiwei Wang, Songze Li

PDF

Open Access

TL;DR

Bi-Erasing introduces a bidirectional framework that simultaneously suppresses harmful concepts and promotes safe alternatives in diffusion models, improving concept removal while maintaining image quality.

Contribution

It proposes a novel bidirectional approach with dual image branches for balanced concept erasure and safety enhancement in diffusion models.

Findings

01

Outperforms baseline methods in concept removal effectiveness.

02

Maintains higher visual fidelity during concept erasure.

03

Effectively balances safety and image quality.

Abstract

Concept erasure, which fine-tunes diffusion models to remove undesired or harmful visual concepts, has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.However, existing removal methods typically adopt a unidirectional erasure strategy by either suppressing the target concept or reinforcing safe alternatives, making it difficult to achieve a balanced trade-off between concept removal and generation quality. To address this limitation, we propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously. Specifically, based on the joint representation of text prompts and corresponding images, Bi-Erasing introduces two decoupled image branches: a negative branch responsible for suppressing harmful semantics and a positive branch providing visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection