TL;DR
HFRU is a reinforcement learning framework that unlearns sensitive visual information in vision-language models by operating on the vision encoder, reducing object hallucination and improving privacy and bias mitigation.
Contribution
HFRU introduces a novel two-stage reinforcement unlearning method targeting the vision encoder to achieve deep semantic removal of sensitive data.
Findings
Achieves over 98% forgetting and retention performance.
Significantly reduces object hallucination compared to prior methods.
Effective on object recognition and face identity tasks.
Abstract
Vision-language models (VLMs) raise growing concerns about privacy, copyright, and bias, motivating machine unlearning to remove sensitive knowledge. However, existing methods primarily fine-tune the language decoder, leading to superficial forgetting that fails to erase underlying visual representations and often introduces object hallucination. We propose HFRU, a reinforcement unlearning framework that operates on the vision encoder for deep semantic removal. Our two-stage approach combines alignment disruption with GRPO-based optimization using a composite reward, including an abstraction reward that encourages semantically valid substitutions and mitigates hallucinations. Experiments on object recognition and face identity tasks show that HFRU achieves over 98% forgetting and retention performance, while introducing negligible object hallucination, significantly outperforming prior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
