Unlearning Offline Stochastic Multi-Armed Bandits
Zichun Ye, Runqi Wang, Xuchuang Wang, Xutong Liu, Shuai Li, Mohammad Hajiesmaili

TL;DR
This paper pioneers the study of machine unlearning in offline stochastic multi-armed bandits, proposing algorithms with performance guarantees to balance privacy and decision quality.
Contribution
It formalizes privacy constraints for offline MAB, introduces adaptive unlearning algorithms, and provides theoretical and experimental validation.
Findings
Algorithms achieve privacy-utility tradeoffs consistent with theoretical bounds.
Adaptive algorithms outperform fixed strategies in various data regimes.
Experiments confirm the effectiveness of proposed unlearning methods.
Abstract
Machine unlearning aims to unlearn data points from a learned model, offering a principled way to process data-deletion requests and mitigate privacy risks without full retraining. Prior work has mainly studied unsupervised / supervised machine unlearning, leaving unlearning for sequential decision-making systems far less understood. We initiate the first study of a foundational sequential decision-making problem: offline stochastic multi-armed bandits (MAB). We formalize the privacy constraint for offline MAB and measure utility by the post-unlearning decision quality. We conduct a systematic study of both single- and multi-source unlearning scenarios under two data-generation models, the fixed-sample model and the distribution model. For these settings, our algorithmic design is built on two canonical base algorithms: Gaussian mechanism and rollback, and we propose adaptive algorithms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
