Closed-Form Concept Erasure via Double Projections
Chi Zhang, Jingpu Cheng, Zhixian Wang, Ping Liu

TL;DR
This paper introduces a simple, analytical linear transformation method for concept erasure in generative models, which is efficient, interpretable, and matches or exceeds existing techniques.
Contribution
It proposes a closed-form, training-free approach for concept removal that is deterministic, geometrically interpretable, and preserves non-target concepts better.
Findings
Matches or surpasses state-of-the-art erasure performance
Preserves non-target concepts more faithfully
Requires only a few seconds to apply
Abstract
While modern generative models such as diffusion-based architectures have enabled impressive creative capabilities, they also raise important safety and ethical risks. These concerns have led to growing interest in concept erasure, the process of removing unwanted concepts from model representations. Existing approaches often achieve strong erasure performance but rely on iterative optimization and may inadvertently distort unrelated concepts. In this work, we present a simple yet principled alternative: a linear transformation framework that achieves concept erasure analytically, without any training. Our method adapts a pretrained model through two sequential, closed-form steps: first, computing a proxy projection of the target concept, and second, applying a constrained transformation within the left null space of known concept directions. This design yields a deterministic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
