Corrupt Bandits for Preserving Local Privacy
Pratik Gajane, Tanguy Urvoy, Emilie Kaufmann

TL;DR
This paper introduces a new corrupted bandit model motivated by privacy in recommender systems, providing theoretical regret bounds, privacy guarantees, and experimental validation for algorithms designed to operate under reward corruption.
Contribution
It develops the first regret bounds for corrupted bandits, proposes new algorithms with privacy considerations, and analyzes the trade-off between privacy and learning performance.
Findings
Regret bounds are established for corrupted bandit algorithms.
Privacy guarantees are achieved through specific corruption parameters.
Experimental results validate the theoretical analysis and algorithm performance.
Abstract
We study a variant of the stochastic multi-armed bandit (MAB) problem in which the rewards are corrupted. In this framework, motivated by privacy preservation in online recommender systems, the goal is to maximize the sum of the (unobserved) rewards, based on the observation of transformation of these rewards through a stochastic corruption process with known parameters. We provide a lower bound on the expected regret of any bandit algorithm in this corrupted setting. We devise a frequentist algorithm, KLUCB-CF, and a Bayesian algorithm, TS-CF and give upper bounds on their regret. We also provide the appropriate corruption parameters to guarantee a desired level of local privacy and analyze how this impacts the regret. Finally, we present some experimental results that confirm our analysis.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques
