Can VLMs Truly Forget? Benchmarking Training-Free Visual Concept Unlearning
Zhangyun Tan, Zeliang Zhang, Susan Liang, Yolo Yunlong Tang, Lisha Chen, Chenliang Xu

TL;DR
This paper introduces VLM-UnBench, a comprehensive benchmark for evaluating training-free visual concept unlearning in vision-language models, revealing limitations of current suppression methods.
Contribution
It provides the first rigorous benchmark for training-free visual concept unlearning, analyzing effectiveness across multiple datasets, concepts, and model configurations.
Findings
Realistic prompts do not significantly forget concepts compared to no-instruction baseline.
Meaningful forgetting only occurs under oracle conditions with explicit concept disclosure.
Object and scene concepts are highly resistant to suppression, even with explicit instructions.
Abstract
VLMs trained on web-scale data retain sensitive and copyrighted visual concepts that deployment may require removing. Training-based unlearning methods share a structural flaw: fine-tuning on a narrow forget set degrades general capabilities before unlearning begins, making it impossible to attribute subsequent performance drops to the unlearning procedure itself. Training-free approaches sidestep this by suppressing concepts through prompts or system instructions, but no rigorous benchmark exists for evaluating them on visual tasks. We introduce VLM-UnBench, the first benchmark for training-free visual concept unlearning in VLMs. It covers four forgetting levels, 7 source datasets, and 11 concept axes, and pairs a three-level probe taxonomy with five evaluation conditions to separate genuine forgetting from instruction compliance. Across 8 evaluation settings and 13 VLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
