ActErase: A Training-Free Paradigm for Precise Concept Erasure via Activation Redirection
Yi Sun, Xinhao Zhong, Hongyan Li, Yimin Zhou, Junhao Li, Bin Chen, Xuan Wang

TL;DR
ActErase introduces a training-free, activation redirection method for precise concept erasure in diffusion models, addressing safety and ethical concerns efficiently without fine-tuning.
Contribution
It proposes a novel, training-free approach that identifies and replaces activation differences for concept erasure, outperforming existing methods in efficiency and effectiveness.
Findings
Achieves state-of-the-art erasure performance across multiple tasks
Effectively preserves the model's overall generative capability
Demonstrates robustness against adversarial attacks
Abstract
Recent advances in text-to-image diffusion models have demonstrated remarkable generation capabilities, yet they raise significant concerns regarding safety, copyright, and ethical implications. Existing concept erasure methods address these risks by removing sensitive concepts from pre-trained models, but most of them rely on data-intensive and computationally expensive fine-tuning, which poses a critical limitation. To overcome these challenges, inspired by the observation that the model's activations are predominantly composed of generic concepts, with only a minimal component can represent the target concept, we propose a novel training-free method (ActErase) for efficient concept erasure. Specifically, the proposed method operates by identifying activation difference regions via prompt-pair analysis, extracting target activations and dynamically replacing input activations during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
