LMO-DP: Optimizing the Randomization Mechanism for Differentially   Private Fine-Tuning (Large) Language Models

Qin Yang; Meisam Mohammad; Han Wang; Ali Payani; Ashish; Kundu; Kai Shu; Yan Yan; Yuan Hong

arXiv:2405.18776·cs.CR·May 30, 2024

LMO-DP: Optimizing the Randomization Mechanism for Differentially Private Fine-Tuning (Large) Language Models

Qin Yang, Meisam Mohammad, Han Wang, Ali Payani, Ashish, Kundu, Kai Shu, Yan Yan, Yuan Hong

PDF

Open Access

TL;DR

This paper introduces LMO-DP, a novel mechanism for differentially private fine-tuning of large language models that reduces noise and improves accuracy, especially under strong privacy constraints.

Contribution

The paper proposes LMO-DP, a new DP mechanism with an offline optimal noise search, enabling more accurate private fine-tuning of large language models in strong privacy regimes.

Findings

01

Achieves 92.20% accuracy on SST-2 with $\, ext{epsilon}=0.3$

02

Significantly outperforms Gaussian mechanism in strong privacy regimes

03

First to accurately fine-tune Llama-2 with strong DP guarantees

Abstract

Differentially Private Stochastic Gradient Descent (DP-SGD) and its variants have been proposed to ensure rigorous privacy for fine-tuning large-scale pre-trained language models. However, they rely heavily on the Gaussian mechanism, which may overly perturb the gradients and degrade the accuracy, especially in stronger privacy regimes (e.g., the privacy budget $ϵ < 3$ ). To address such limitations, we propose a novel Language Model-based Optimal Differential Privacy (LMO-DP) mechanism, which takes the first step to enable the tight composition of accurately fine-tuning (large) language models with a sub-optimal DP mechanism, even in strong privacy regimes (e.g., $0.1 \leq ϵ < 3$ ). Furthermore, we propose a novel offline optimal noise search method to efficiently derive the sub-optimal DP that significantly reduces the noise magnitude. For instance, fine-tuning RoBERTa-large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Cosine Annealing · Layer Normalization · Weight Decay · Attention Dropout · Linear Layer · Linear Warmup With Cosine Annealing · Byte Pair Encoding · Adam · Attention Is All You Need