SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning

Cai Selvas-Sala; Lei Kang; Lluis Gomez

arXiv:2603.26316·cs.CV·March 30, 2026

SALMUBench: A Benchmark for Sensitive Association-Level Multimodal Unlearning

Cai Selvas-Sala, Lei Kang, Lluis Gomez

PDF

1 Repo 2 Models 4 Datasets

TL;DR

SALMUBench is a new benchmark for evaluating fine-grained unlearning of sensitive association data in multimodal models, highlighting current methods' limitations and providing tools for future research.

Contribution

Introduces SALMUBench, a synthetic dataset and evaluation protocol for association-level unlearning in multimodal models, with publicly available resources.

Findings

01

Current unlearning methods either fail to forget effectively or over-generalize.

02

SALMUBench reveals distinct failure modes in existing unlearning techniques.

03

Benchmark sets new standards for comprehensive evaluation of unlearning methods.

Abstract

As multimodal models like CLIP become integral to downstream systems, the need to remove sensitive information is critical. However, machine unlearning for contrastively-trained encoders remains underexplored, and existing evaluations fail to diagnose fine-grained, association-level forgetting. We introduce SALMUBench (Sensitive Association-Level Multimodal Unlearning), a benchmark built upon a synthetic dataset of 60K persona-attribute associations and two foundational models: a Compromised model polluted with this data, and a Clean model without it. To isolate unlearning effects, both are trained from scratch on the same 400M-pair retain base, with the Compromised model additionally trained on the sensitive set. We propose a novel evaluation protocol with structured holdout sets (holdout identity, holdout association) to precisely measure unlearning efficacy and collateral damage. Our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cvc-mmu/salmubench
github

Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.