FlashDP: Private Training Large Language Models with Efficient DP-SGD

Liangyu Wang; Junxiao Wang; Jie Ren; Zihang Xiang; David E. Keyes; Di Wang

arXiv:2507.01154·cs.LG·July 3, 2025

FlashDP: Private Training Large Language Models with Efficient DP-SGD

Liangyu Wang, Junxiao Wang, Jie Ren, Zihang Xiang, David E. Keyes, Di Wang

PDF

Open Access

TL;DR

FlashDP introduces a cache-efficient, single-pass per-layer DP-SGD method that significantly reduces memory and computation overhead, enabling faster privacy-preserving training of large language models without sacrificing accuracy.

Contribution

The paper presents FlashDP, a novel DP-SGD implementation that reduces memory movement and redundant computation, improving efficiency for large-scale language model training.

Findings

01

Reduces memory movement by up to 50%.

02

Cuts redundant computations by 20%.

03

Achieves 90% throughput of non-DP training on A100 GPUs.

Abstract

As large language models (LLMs) increasingly underpin technological advancements, the privacy of their training data emerges as a critical concern. Differential Privacy (DP) serves as a rigorous mechanism to protect this data, yet its integration via Differentially Private Stochastic Gradient Descent (DP-SGD) introduces substantial challenges, primarily due to the complexities of per-sample gradient clipping. Current explicit methods, such as Opacus, necessitate extensive storage for per-sample gradients, significantly inflating memory requirements. Conversely, implicit methods like GhostClip reduce storage needs by recalculating gradients multiple times, which leads to inefficiencies due to redundant computations. This paper introduces FlashDP, an innovative cache-friendly per-layer DP-SGD that consolidates necessary operations into a single task, calculating gradients only once in a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Mobile Crowdsensing and Crowdsourcing