TL;DR
DP-GRAPE is a memory-efficient differentially private training method that replaces costly SVD computations with random projections, maintaining utility while significantly reducing memory usage.
Contribution
Introduces DP-GRAPE, a novel DP training approach using random Gaussian projections to eliminate SVD, enabling scalable, memory-efficient privacy-preserving neural network training.
Findings
Reduces memory usage by over 63% for Vision Transformers.
Achieves over 70% memory reduction when fine-tuning RoBERTa-Large.
Scales to large models like OPT with 6.7 billion parameters.
Abstract
Differential privacy (DP) protects sensitive data during neural network training, but standard methods like DP-Adam suffer from high memory overhead due to per-sample gradient clipping, limiting scalability. We introduce DP-GRAPE (Gradient RAndom ProjEction), a DP training method that significantly reduces memory usage while maintaining utility on par with first-order DP approaches. DP-GRAPE is motivated by our finding that privatization flattens the gradient singular value spectrum, making SVD-based projections (as in GaLore (Zhao et al., 2024)) unnecessary. Consequently, DP-GRAPE employs three key components: (1) random Gaussian matrices replace SVD-based subspaces, (2) gradients are privatized after projection, and (3) projection is applied during backpropagation. These contributions eliminate the need for costly SVD computations, enable substantial memory savings, and lead to…
Peer Reviews
Decision·Submitted to ICLR 2026
- using random projections (DP-GRAPE) instead of SVD-based projections, which is memory efficient. - DP-GRAPE (Gradient RAndom ProjEction) achieves a privacy-utility trade-off comparable to DP-SGD. - The margins in the experiments are significant, in terms of the memory reduction, while preserving the accuracy.
- Comparisons asre not sufficient with SOTA methods, and other subspace methods. - The robustness analysis for failure cases is missing.
The observation about spectral flattening is novel and provides a principled reason to abandon SVD-based projections. The authors provide a theoretical privacy and convergence analysis for DP-GRAPE, which is non-trivial due to the introduction of random projections. Evaluations cover both CV (ViT pre-training) and NLP (RoBERTa, OPT). Achieves large-scale DP training (OPT, 6.7B). Memory savings in training are considerable: it cuts memory by over 63% in Vision Transformer training and 70% in R
The privacy guarantee under random projections with unbounded entries is described informally. A more rigorous sensitivity or RDP proof sketch is needed. DP-GRAPE’s algorithm is more complex to implement than vanilla DP-SGD/DP-Adam. I'm not sure how practical would be to implement it. No code mentioning.
Originality. The paper advances DP training by coupling project-then-privatize gradient handling with random low-rank projections, motivated by the observation that privatization flattens the gradient spectrum. Quality. The paper provides rigorous theoretical guarantees and offers reproducible implementation details and hyperparameter guidance. Clarity. Figures, tables, and the presentation of the algorithm are clear with consistent notation that makes the method easy to follow. Significance. D
Limited novelty (main concern). Algorithmically, the core move—projecting gradients into a low-dimensional subspace and then privatizing—is a direct transplant of low-rank / random-projection ideas into the DP setting; the paper does not introduce a fundamentally new optimization principle. On the theory side, the guarantees largely read as an incremental generalization of standard DP-SGD analyses to the projected case. Missing head-to-head experiments with the methods surveyed in Table 1. Tabl
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Stochastic Gradient Optimization Techniques · Machine Learning and Algorithms
