DP-Forward: Fine-tuning and Inference on Language Models with   Differential Privacy in Forward Pass

Minxin Du; Xiang Yue; Sherman S. M. Chow; Tianhao Wang; Chenyu Huang,; and Huan Sun

arXiv:2309.06746·cs.CR·September 20, 2023

DP-Forward: Fine-tuning and Inference on Language Models with Differential Privacy in Forward Pass

Minxin Du, Xiang Yue, Sherman S. M. Chow, Tianhao Wang, Chenyu Huang,, and Huan Sun

PDF

1 Repo

TL;DR

DP-Forward introduces a novel method for privacy-preserving language model inference by directly perturbing embedding matrices during the forward pass, achieving strong privacy guarantees, improved utility, and reduced computational costs compared to traditional DP-SGD.

Contribution

It proposes DP-Forward, a new approach that applies differential privacy directly to embedding matrices in the forward pass of language models, with an analytic matrix Gaussian mechanism for minimal noise.

Findings

01

Utility nearly matches non-private baselines

02

Outperforms DP-SGD by up to 7.7 percentage points

03

Reduces time and memory costs by 3 times

Abstract

Differentially private stochastic gradient descent (DP-SGD) adds noise to gradients in back-propagation, safeguarding training data from privacy leakage, particularly membership inference. It fails to cover (inference-time) threats like embedding inversion and sensitive attribute inference. It is also costly in storage and computation when used to fine-tune large pre-trained language models (LMs). We propose DP-Forward, which directly perturbs embedding matrices in the forward pass of LMs. It satisfies stringent local DP requirements for training and inference data. To instantiate it using the smallest matrix-valued noise, we devise an analytic matrix Gaussian~mechanism (aMGM) by drawing possibly non-i.i.d. noise from a matrix Gaussian distribution. We then investigate perturbing outputs from different hidden (sub-)layers of LMs with aMGM noises. Its utility on three typical tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

xiangyue9607/dp-forward
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.