CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Tianyu Yang; Lisen Dai; Xiangqi Wang; Minhao Cheng; Yapeng Tian; Xiangliang Zhang

arXiv:2410.23330·cs.CV·June 9, 2025

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP

Tianyu Yang, Lisen Dai, Xiangqi Wang, Minhao Cheng, Yapeng Tian, Xiangliang Zhang

PDF

Open Access 1 Video

TL;DR

CLIPErase introduces an efficient method for unlearning specific visual-textual associations in CLIP, ensuring targeted forgetting without degrading overall model performance across multiple datasets and tasks.

Contribution

The paper presents CLIPErase, a novel approach that disentangles and selectively forgets visual and textual associations in CLIP, addressing the challenge of unlearning in multimodal models.

Findings

01

Effective unlearning of associations in CLIP demonstrated on CIFAR-100 and Flickr30K.

02

Preserves model performance on retain set after unlearning.

03

Achieves targeted forgetting in zero-shot multimodal tasks.

Abstract

Machine unlearning (MU) has gained significant attention as a means to remove specific data from trained models without requiring a full retraining process. While progress has been made in unimodal domains like text and image classification, unlearning in multimodal models remains relatively underexplored. In this work, we address the unique challenges of unlearning in CLIP, a prominent multimodal model that aligns visual and textual representations. We introduce CLIPErase, a novel approach that disentangles and selectively forgets both visual and textual associations, ensuring that unlearning does not compromise model performance. CLIPErase consists of three key modules: a Forgetting Module that disrupts the associations in the forget set, a Retention Module that preserves performance on the retain set, and a Consistency Module that maintains consistency with the original model.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

CLIPErase: Efficient Unlearning of Visual-Textual Associations in CLIP· underline

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training · Sparse Evolutionary Training