TL;DR
This paper introduces Concept Pinpoint Eraser (CPE), a nonlinear framework with Residual Attention Gates for effectively erasing specific concepts in text-to-image diffusion models while preserving other concepts, enhancing robustness and performance.
Contribution
The paper proposes a novel nonlinear residual attention gate framework with attention anchoring and adversarial training for precise concept erasure in diffusion models, outperforming prior methods.
Findings
CPE effectively erases target concepts like celebrities and styles.
CPE preserves diverse remaining concepts with high robustness.
CPE outperforms prior methods in concept erasure tasks.
Abstract
Remarkable progress in text-to-image diffusion models has brought a major concern about potentially generating images on inappropriate or trademarked concepts. Concept erasing has been investigated with the goals of deleting target concepts in diffusion models while preserving other concepts with minimal distortion. To achieve these goals, recent concept erasing methods usually fine-tune the cross-attention layers of diffusion models. In this work, we first show that merely updating the cross-attention layers in diffusion models, which is mathematically equivalent to adding \emph{linear} modules to weights, may not be able to preserve diverse remaining concepts. Then, we propose a novel framework, dubbed Concept Pinpoint Eraser (CPE), by adding \emph{nonlinear} Residual Attention Gates (ResAGs) that selectively erase (or cut) target concepts while safeguarding remaining concepts from…
Peer Reviews
Decision·ICLR 2025 Poster
1. This paper elucidates the inadequacy of simply modifying the linear mapping matrix of cross-attention through a solid mathematical proof, which leads to a non-linear ResAgs module and a novel training loss function. 2. This paper conducts a large number of experiments, including character privacy, art style, NSFW content, etc., comparing many baseline methods.
1. The major weakness of this work is that the propose method may not work in open-source scenarios. Methods like ESD, UCE, RECE, etc. directly modify the UNet, and the LoRA of MACE can merge into the UNet, so all these changes can be directly applied to the UNet, and thus the user can't directly bypass these security mechanisms. CPE uses a non-linear add-on module, which cannot be merged to the UNet, so in the open-source scenario, the user can directly delete the code of the add-on module, thu
1. To my knowledge, there are no significant flaws in the proofs of theorems; the work appears solid and reliable. 2. The experiments demonstrate the effectiveness of this approach, and the performance gains are compelling enough to validate the work, even though it shows weaker results on certain erasing tasks. 3. Additionally, the ablation study confirms the effectiveness of each component.
1.The robustness training seems to be a separate endeavor from the residual attention gates, while adversarial training acts more as a supplementary effort to enhance ResAGs. The robustness evaluation is directly performed on the model assessment. Does this indicate that the robustness of the model achieved through direct training is weaker than that of other models? 2. Given the proof, the addition of an extra term could be represented by various architectures. However, why do ResAGs demonstrat
1. The theoretical analysis of finetuning cross-attention layers is solid and sound. 2. The experimental results show that the proposed CPE successfully preserves the remaining concepts while erasing the target concept better than previous methods across various domains. 3. Each component in CPE is reasonable and shown to be effective in the ablation study. 4. The paper is well-organized and easy to follow.
1. The assumption for the theoretical analysis of the proposed ResAG raises some concerns. According to the paper, Equation 4 holds if the modes of samples can be detected and if $||f(E)||^2_F$ is small for remaining concepts. How can one determine the modes of samples? Additionally, what threshold for $||f(E)||^2_F$ is considered "small enough" to effectively protect the remaining concepts?
Videos
