On Private and Robust Bandits
Yulian Wu, Xingyu Zhou, Youming Tao, Di Wang

TL;DR
This paper introduces a new approach for private and robust multi-armed bandit algorithms that handle contaminated, heavy-tailed rewards while ensuring differential privacy, providing theoretical bounds and practical schemes.
Contribution
It presents the first minimax lower bound for private heavy-tailed MABs and proposes a meta-algorithm with nearly-optimal regret using reward truncation and Laplace mechanisms.
Findings
Achieves nearly-optimal regret bounds in heavy-tailed, contaminated, private settings.
Provides the first minimax lower bound for private heavy-tailed MABs.
Demonstrates the effectiveness of truncation-based PRM schemes through experiments.
Abstract
We study private and robust multi-armed bandits (MABs), where the agent receives Huber's contaminated heavy-tailed rewards and meanwhile needs to ensure differential privacy. We first present its minimax lower bound, characterizing the information-theoretic limit of regret with respect to privacy budget, contamination level and heavy-tailedness. Then, we propose a meta-algorithm that builds on a private and robust mean estimation sub-routine \texttt{PRM} that essentially relies on reward truncation and the Laplace mechanism only. For two different heavy-tailed settings, we give specific schemes of \texttt{PRM}, which enable us to achieve nearly-optimal regret. As by-products of our main results, we also give the first minimax lower bound for private heavy-tailed MABs (i.e., without contamination). Moreover, our two proposed truncation-based \texttt{PRM} achieve the optimal trade-off…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · Cognitive Radio Networks and Spectrum Sensing
