ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition
Shen Lin, Jing Lin, Junhao Dong, Piotr Koniusz, Li Xu

TL;DR
This paper introduces a concept-level unlearning method for vision-language models that enables precise removal of specific knowledge while preserving unrelated semantics, using interpretable concept decomposition.
Contribution
It proposes a novel framework that constructs a task-specific concept vocabulary and decomposes visual representations for fine-grained unlearning in VLMs.
Findings
Enables more comprehensive target forgetting.
Better preserves non-target knowledge within images.
Maintains competitive model utility.
Abstract
Machine unlearning in Vision-Language Models (VLMs) is typically performed at the image or instance level, making it difficult to precisely remove target knowledge without affecting unrelated semantics. This issue is especially pronounced since a single image often contains multiple entangled concepts, including both target concepts to be forgotten and contextual information that should be preserved. In this paper, we propose an interpretable concept-level unlearning framework for VLMs, which constructs a compact task-specific concept vocabulary from the forgetting set using a multimodal large language model. In addition to modality alignment, visual representations are decomposed into sparse, nonnegative combinations of semantic concepts, providing an explicit interface for fine-grained knowledge manipulation. Based on this decomposition, our method formulates unlearning as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
