Shuffle and Joint Differential Privacy for Generalized Linear Contextual Bandits
Sahasrajit Sarmasarkar

TL;DR
This paper introduces the first algorithms for generalized linear contextual bandits that ensure shuffle and joint differential privacy, addressing key challenges with new privacy-preserving optimization techniques.
Contribution
It develops novel private algorithms for GLM bandits under shuffle and joint differential privacy, overcoming the lack of closed-form estimators and tracking privacy across evolving data.
Findings
Shuffle-DP algorithm achieves regret close to non-private in stochastic contexts.
Joint-DP algorithm matches non-private regret in adversarial contexts with an additive privacy cost.
No spectral assumptions needed beyond boundedness for context distributions.
Abstract
We present the first algorithms for generalized linear contextual bandits under shuffle differential privacy and joint differential privacy. While prior work on private contextual bandits has been restricted to linear reward models -- which admit closed-form estimators -- generalized linear models (GLMs) pose fundamental new challenges: no closed-form estimator exists, requiring private convex optimization; privacy must be tracked across multiple evolving design matrices; and optimization error must be explicitly incorporated into regret analysis. We address these challenges under two privacy models and context settings. For stochastic contexts, we design a shuffle-DP algorithm achieving regret in dominant term, differing from the non-private rate by a factor of . For adversarial contexts, we provide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
