Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

Afshin Khadangi; Amir Sartipi; Igor Tchappi; Ramin Bahmani; Gilbert Fridgen

arXiv:2507.22565·cs.LG·July 31, 2025

Efficient Differentially Private Fine-Tuning of LLMs via Reinforcement Learning

Afshin Khadangi, Amir Sartipi, Igor Tchappi, Ramin Bahmani, Gilbert Fridgen

PDF

1 Models

TL;DR

This paper introduces RLDP, a reinforcement learning framework that adaptively manages privacy-preserving fine-tuning of large language models, improving utility and efficiency while maintaining formal differential privacy guarantees.

Contribution

RLDP is the first method to treat differential privacy optimization as a closed-loop control problem using deep RL, enabling adaptive privacy-utility trade-offs during LLM fine-tuning.

Findings

01

Achieves 5.6% utility improvement on average.

02

Speeds up training by 71% compared to baselines.

03

Maintains privacy guarantees while reducing vulnerability to attacks.

Abstract

The tension between data privacy and model utility has become the defining bottleneck for the practical deployment of large language models (LLMs) trained on sensitive corpora including healthcare. Differentially private stochastic gradient descent (DP-SGD) guarantees formal privacy, yet it does so at a pronounced cost: gradients are forcibly clipped and perturbed with noise, degrading sample efficiency and final accuracy. Numerous variants have been proposed to soften this trade-off, but they all share a handicap: their control knobs are hard-coded, global, and oblivious to the evolving optimization landscape. Consequently, practitioners are forced either to over-spend privacy budget in pursuit of utility, or to accept mediocre models in order to stay within privacy constraints. We present RLDP, the first framework to cast DP optimization itself as a closed-loop control problem…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
akhadangi/RLDP
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.