Bi-Erasing: A Bidirectional Framework for Concept Removal in Diffusion Models
Hao Chen, Yiwei Wang, Songze Li

TL;DR
Bi-Erasing introduces a bidirectional framework that simultaneously suppresses harmful concepts and promotes safe alternatives in diffusion models, improving concept removal while maintaining image quality.
Contribution
It proposes a novel bidirectional approach with dual image branches for balanced concept erasure and safety enhancement in diffusion models.
Findings
Outperforms baseline methods in concept removal effectiveness.
Maintains higher visual fidelity during concept erasure.
Effectively balances safety and image quality.
Abstract
Concept erasure, which fine-tunes diffusion models to remove undesired or harmful visual concepts, has become a mainstream approach to mitigating unsafe or illegal image generation in text-to-image models.However, existing removal methods typically adopt a unidirectional erasure strategy by either suppressing the target concept or reinforcing safe alternatives, making it difficult to achieve a balanced trade-off between concept removal and generation quality. To address this limitation, we propose a novel Bidirectional Image-Guided Concept Erasure (Bi-Erasing) framework that performs concept suppression and safety enhancement simultaneously. Specifically, based on the joint representation of text prompts and corresponding images, Bi-Erasing introduces two decoupled image branches: a negative branch responsible for suppressing harmful semantics and a positive branch providing visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection
