Weights Shuffling for Improving DPSGD in Transformer-based Models
Jungang Yang, Zhe Ji, Liyao Xiang

TL;DR
This paper proposes a novel shuffling mechanism in DPSGD to improve privacy-utility trade-offs in large transformer models, supported by theoretical analysis and empirical validation.
Contribution
It introduces a permutation-based shuffling method in DPSGD that enhances privacy guarantees and model utility without additional privacy loss, validated through theory and experiments.
Findings
Shuffling adds beneficial randomness to gradient descent trajectories.
Theoretical privacy guarantees are close to actual privacy levels.
Experimental results show improved accuracy over state-of-the-art baselines.
Abstract
Differential Privacy (DP) mechanisms, especially in high-dimensional settings, often face the challenge of maintaining privacy without compromising the data utility. This work introduces an innovative shuffling mechanism in Differentially-Private Stochastic Gradient Descent (DPSGD) to enhance the utility of large models at the same privacy guarantee of the unshuffled case. Specifically, we reveal that random shuffling brings additional randomness to the trajectory of gradient descent while not impacting the model accuracy by the permutation invariance property -- the model can be equivalently computed in both forward and backward propagations under permutation. We show that permutation indeed improves the privacy guarantee of DPSGD in theory, but tracking the exact privacy loss on shuffled model is particularly challenging. Hence we exploit the approximation on sum of lognormal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMagnetic Properties and Applications
