Differentially Private Clipped-SGD: High-Probability Convergence with Arbitrary Clipping Level
Saleh Vatan Khah, Savelii Chezhegov, Shahrokh Farahmand, Samuel Horv\'ath, Eduard Gorbunov

TL;DR
This paper provides the first high-probability convergence analysis for differentially private clipped SGD with a fixed clipping level, applicable to both convex and non-convex optimization under heavy-tailed noise, balancing convergence and privacy.
Contribution
It introduces a novel convergence analysis for DP-Clipped-SGD with fixed clipping, applicable to heavy-tailed noise, and offers a refined trade-off between convergence speed and privacy guarantees.
Findings
Converges to a neighborhood of the optimum with a faster rate.
Applicable to both convex and non-convex smooth optimization.
Balances convergence speed with differential privacy constraints.
Abstract
Gradient clipping is a fundamental tool in Deep Learning, improving the high-probability convergence of stochastic first-order methods like SGD, AdaGrad, and Adam under heavy-tailed noise, which is common in training large language models. It is also a crucial component of Differential Privacy (DP) mechanisms. However, existing high-probability convergence analyses typically require the clipping threshold to increase with the number of optimization steps, which is incompatible with standard DP mechanisms like the Gaussian mechanism. In this work, we close this gap by providing the first high-probability convergence analysis for DP-Clipped-SGD with a fixed clipping level, applicable to both convex and non-convex smooth optimization under heavy-tailed noise, characterized by a bounded central -th moment assumption, . Our results show that, with a fixed clipping…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsCryptography and Data Security · Complexity and Algorithms in Graphs · Stochastic Gradient Optimization Techniques
