On the Convergence of Stochastic Gradient Descent with Perturbed Forward-Backward Passes
Boao Kong, Hengrui Zhang, Kun Yuan

TL;DR
This paper provides a comprehensive theoretical analysis of stochastic gradient descent with perturbations in both forward and backward passes, revealing how these perturbations propagate and affect convergence in deep learning models.
Contribution
It introduces the first detailed analysis of perturbation effects in multi-operator stochastic gradient descent, including convergence guarantees and explanations for gradient spiking phenomena.
Findings
Perturbations cascade and amplify through the computational graph.
Convergence guarantees are established for non-convex and Polyak--Lojasiewicz functions.
Experiments validate the theory and illustrate spike behavior and sensitivity differences.
Abstract
We study stochastic gradient descent (SGD) for composite optimization problems with sequential operators subject to perturbations in both the forward and backward passes. Unlike classical analyses that treat gradient noise as additive and localized, perturbations to intermediate outputs and gradients cascade through the computational graph, compounding geometrically with the number of operators. We present the first comprehensive theoretical analysis of this setting. Specifically, we characterize how forward and backward perturbations propagate and amplify within a single gradient step, derive convergence guarantees for both general non-convex objectives and functions satisfying the Polyak--\L{}ojasiewicz condition, and identify conditions under which perturbations do not deteriorate the asymptotic convergence order. As a byproduct, our analysis furnishes a theoretical explanation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
