Rethinking Post-Unlearning Behavior of Large Vision-Language Models
Minsung Kim, Nakyeong Yang, Kyomin Jung

TL;DR
This paper introduces PUBG, a new unlearning method for large vision-language models that ensures privacy preservation while maintaining informative and visually grounded responses, addressing issues of unlearning aftermaths.
Contribution
The paper proposes a novel unlearning task and PUBG method that improve post-unlearning response quality and privacy safety in LVLMs.
Findings
PUBG effectively mitigates unlearning aftermaths.
Existing methods fail to produce informative responses after unlearning.
PUBG prevents privacy leakage while maintaining response quality.
Abstract
Large Vision-Language Models (LVLMs) can recognize individuals in images and disclose sensitive personal information about them, raising critical privacy concerns. Machine unlearning aims to remove such knowledge from the model. However, existing methods rarely prescribe what the model should output in place of the forgotten content, leading to Unlearning Aftermaths: degenerate, hallucinated, or excessively refused responses. We argue that, especially for generative LVLMs, it is crucial to consider the quality and informativeness of post-unlearning responses rather than relying solely on naive suppression. To address this, we introduce a new unlearning task for LVLMs that requires models to provide privacy-preserving yet informative and visually grounded responses. We also propose PUBG, a novel unlearning method that explicitly guides post-unlearning behavior toward a desirable output…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
