Convergent Privacy Loss of Noisy-SGD without Convexity and Smoothness
Eli Chien, Pan Li

TL;DR
This paper establishes convergent differential privacy bounds for Noisy-SGD algorithms on non-convex, non-smooth losses without requiring convexity or smoothness, extending prior results and improving privacy guarantees.
Contribution
It proves convergent R'enyi DP bounds for non-convex, non-smooth losses with H"older continuous gradients, relaxing previous assumptions and enhancing privacy analysis.
Findings
Convergent DP bounds for non-convex, non-smooth losses.
Improved privacy bounds for strongly convex losses.
Enhanced analysis techniques using shifted divergence and Wasserstein distance.
Abstract
We study the Differential Privacy (DP) guarantee of hidden-state Noisy-SGD algorithms over a bounded domain. Standard privacy analysis for Noisy-SGD assumes all internal states are revealed, which leads to a divergent R'enyi DP bound with respect to the number of iterations. Ye & Shokri (2022) and Altschuler & Talwar (2022) proved convergent bounds for smooth (strongly) convex losses, and raise open questions about whether these assumptions can be relaxed. We provide positive answers by proving convergent R'enyi DP bound for non-convex non-smooth losses, where we show that requiring losses to have H\"older continuous gradient is sufficient. We also provide a strictly better privacy bound compared to state-of-the-art results for smooth strongly convex losses. Our analysis relies on the improvement of shifted divergence analysis in multiple aspects, including forward Wasserstein distance…
Peer Reviews
Decision·ICLR 2025 Poster
- The paper improves upon the state-of-the art DP bounds for Noisy-SGD on strongly-convex and smooth losses. - The relaxation of assumption on Lipschitz-continuity to Holder-continuity is a step forward towards assumption lean convergent DP bound for Noisy-SGD. - The paper provides DP bounds under two batch subsampling techniques that are most common. - The ideas in the paper are well illustrated and presented well for a technical audience. - The paper claims that under non-convex but smooth los
- The main results in Theorem 3.1, 3.6 and 3.11 and Theorem aren't provided in a closed-form. This makes them hard to operationalize. - The computational complexity of the presented bounds should be discussed. - Full batch setting of strongly-convex and smooth case in Theorem 3.1 isn't compared with the bound in Chourasia et al., under identical assumptions. - Utility analysis is missing. For (strongly) convex and smooth losses, the improved DP bounds can yield a better utility bounds that shoul
The strengths of this paper lie in its innovative approach to privacy loss analysis in Noisy-SGD. Unlike prior work that requires assumptions of smoothness and convexity, this paper successfully establishes privacy bounds under more general conditions, making it applicable to a broader range of non-smooth, non-convex problems. Key strengths include: 1. **Generalization with Hölder Continuity**: The paper introduces the concept of Hölder continuous gradients, allowing the privacy analysis to hol
1. **Unclear Structure**: The structure of the paper lacks clarity. The main text includes numerous proofs, while some theorem statements are placed in the appendix, which disrupts the flow and clarity of the overall structure. 2. **Figure Quality**: Figure 2(a) appears somewhat rough and could benefit from refinement to improve visual quality. 3. **Typo in Lemma 2.3**: There is a typo in Lemma 2.3, where the function notation shifts from \( f \) to \( h \), which should be consistent. 4. **O
This paper is well-written and well-structured, clearly discussing prior works and their methodologies. The topic of convergent privacy loss in noisy SGD is significant, and this work makes meaningful progress in addressing it. Additionally, the results concerning non-convex functions, which can outperform output perturbation, are particularly intriguing.
The novelty and improvements presented are somewhat limited. The Forward Wasserstein distance tracking lemma appears straightforward, and the primary enhancement in the shifted divergence analysis seems to lie in identifying a better allocation of shifts. Regarding the non-convex case, I am concerned that the improvements may not be substantial when compared to output perturbation in practical applications. Hence, I would not be surprised if other reviewers lean towards rejecting the paper, giv
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsPrivacy-Preserving Technologies in Data · Cryptography and Data Security · Stochastic Gradient Optimization Techniques
