KL-regularization Itself is Differentially Private in Bandits and RLHF

Yizhou Zhang; Kishan Panaganti; Laixi Shi; Juba Ziani; Adam Wierman

arXiv:2505.18407·cs.LG·October 17, 2025

KL-regularization Itself is Differentially Private in Bandits and RLHF

Yizhou Zhang, Kishan Panaganti, Laixi Shi, Juba Ziani, Adam Wierman

PDF

Open Access

TL;DR

This paper demonstrates that KL-regularization inherently provides differential privacy in bandit and RLHF algorithms, eliminating the need for explicit noise addition while maintaining performance benefits.

Contribution

It reveals that KL-regularization in decision-making algorithms naturally ensures differential privacy, offering a novel privacy guarantee method without extra noise.

Findings

01

KL-regularization induces differential privacy in stochastic policies.

02

Privacy guarantees hold across bandits and reinforcement learning from human feedback.

03

Regularization preserves performance while ensuring privacy.

Abstract

Differential Privacy (DP) provides a rigorous framework for privacy, ensuring the outputs of data-driven algorithms remain statistically indistinguishable across datasets that differ in a single entry. While guaranteeing DP generally requires explicitly injecting noise either to the algorithm itself or to its outputs, the intrinsic randomness of existing algorithms presents an opportunity to achieve DP ``for free''. In this work, we explore the role of regularization in achieving DP across three different decision-making problems: multi-armed bandits, linear contextual bandits, and reinforcement learning from human feedback (RLHF), in offline data settings. We show that adding KL-regularization to the learning objective (a common approach in optimization algorithms) makes the action sampled from the resulting stochastic policy itself differentially private. This offers a new route to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research