Robust Concept Erasure Using Task Vectors

Minh Pham; Kelly O. Marshall; Chinmay Hegde; Niv Cohen

arXiv:2404.03631·cs.CV·February 21, 2025·2 cites

Robust Concept Erasure Using Task Vectors

Minh Pham, Kelly O. Marshall, Chinmay Hegde, Niv Cohen

PDF

Open Access

TL;DR

This paper introduces a robust method for concept erasure in text-to-image models using Task Vectors, which improves safety and preserves core functionality by estimating the necessary edit strength through Diverse Inversion.

Contribution

The paper proposes Diverse Inversion to estimate edit strength and selectively apply concept erasure, enhancing robustness and preserving model performance.

Findings

01

TV-based erasure is more robust to unexpected prompts.

02

Diverse Inversion improves estimation of required edit strength.

03

Selective weight editing enhances erasure effectiveness while maintaining core functions.

Abstract

With the rapid growth of text-to-image models, a variety of techniques have been suggested to prevent undesirable image generations. Yet, these methods often only protect against specific user prompts and have been shown to allow unsafe generations with other inputs. Here we focus on unconditionally erasing a concept from a text-to-image model rather than conditioning the erasure on the user's prompt. We first show that compared to input-dependent erasure methods, concept erasure that uses Task Vectors (TV) is more robust to unexpected user inputs, not seen during training. However, TV-based erasure can also affect the core performance of the edited model, particularly when the required edit strength is unknown. To this end, we propose a method called Diverse Inversion, which we use to estimate the required strength of the TV edit. Diverse Inversion finds within the model input space a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Data Classification

MethodsSparse Evolutionary Training · Focus