The Price of Differential Privacy For Online Learning
Naman Agarwal, Karan Singh

TL;DR
This paper develops differentially private algorithms for online linear optimization, achieving near-optimal regret bounds and demonstrating that privacy can be incorporated with minimal additional cost in certain settings.
Contribution
It introduces differentially private algorithms for online linear optimization with optimal regret bounds, improving previous bounds especially in bandit settings.
Findings
Full-information setting achieves regret of O(√T)+~O(1/ε).
Bandit setting achieves regret of ~O((1/ε)√T), improving over previous bounds.
Differential privacy can be achieved with minimal impact on regret in online learning.
Abstract
We design differentially private algorithms for the problem of online linear optimization in the full information and bandit settings with optimal regret bounds. In the full-information setting, our results demonstrate that -differential privacy may be ensured for free -- in particular, the regret bounds scale as . For bandit linear optimization, and as a special case, for non-stochastic multi-armed bandits, the proposed algorithm achieves a regret of , while the previously known best regret bound was .
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Privacy-Preserving Technologies in Data · Mobile Crowdsensing and Crowdsourcing
