Near-Optimal Algorithms for Differentially Private Online Learning in a Stochastic Environment
Bingshan Hu, Zhiming Huang, Nishant A. Mehta, Nidhi Hegde

TL;DR
This paper develops near-optimal differentially private algorithms for online learning in stochastic environments, achieving tight regret bounds for both bandit and full information feedback settings.
Contribution
It introduces new algorithms with optimal regret bounds for differentially private online learning in stochastic environments, covering both bandit and full information scenarios.
Findings
Achieves optimal instance-dependent regret bounds for private bandit algorithms.
Establishes lower bounds for private full information learning.
Provides algorithms matching lower bounds up to logarithmic factors.
Abstract
In this paper, we study differentially private online learning problems in a stochastic environment under both bandit and full information feedback. For differentially private stochastic bandits, we propose both UCB and Thompson Sampling-based algorithms that are anytime and achieve the optimal instance-dependent regret bound, where is the finite learning horizon, denotes the suboptimality gap between the optimal arm and a suboptimal arm , and is the required privacy parameter. For the differentially private full information setting with stochastic rewards, we show an instance-dependent regret lower bound and an minimax…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques
