ROKA: Robust Knowledge Unlearning against Adversaries
Jinmyeong Shin, Joshua Tapia, Nicholas Ferreira, Gabriel Diaz, Moayed Daneshyari, Hyeran Jeon

TL;DR
This paper introduces ROKA, a novel unlearning method that preserves knowledge and defends against indirect unlearning attacks by balancing model information, with theoretical guarantees and effective performance across large models.
Contribution
ROKA is the first to provide a theoretical framework and guarantee for knowledge preservation during unlearning, addressing knowledge contamination and security threats.
Findings
ROKA effectively unlearns targeted data while maintaining overall model accuracy.
ROKA mitigates indirect unlearning attacks by balancing knowledge influence.
Evaluations show ROKA improves security and performance across various large models.
Abstract
The need for machine unlearning is critical for data privacy, yet existing methods often cause Knowledge Contamination by unintentionally damaging related knowledge. Such a degraded model performance after unlearning has been recently leveraged for new inference and backdoor attacks. Most studies design adversarial unlearning requests that require poisoning or duplicating training data. In this study, we introduce a new unlearning-induced attack model, namely indirect unlearning attack, which does not require data manipulation but exploits the consequence of knowledge contamination to perturb the model accuracy on security-critical predictions. To mitigate this attack, we introduce a theoretical framework that models neural networks as Neural Knowledge Systems. Based on this, we propose ROKA, a robust unlearning strategy centered on Neural Healing. Unlike conventional unlearning methods…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Privacy-Preserving Technologies in Data · Advanced Graph Neural Networks
