Stochastic Gradient Descent outperforms Gradient Descent in recovering a high-dimensional signal in a glassy energy landscape
Persia Jana Kamali, Pierfrancesco Urbani

TL;DR
This paper demonstrates through theoretical analysis that Stochastic Gradient Descent (SGD) outperforms Gradient Descent (GD) in recovering high-dimensional signals within complex energy landscapes, especially with small batch sizes.
Contribution
It provides a dynamical mean field theory benchmark showing SGD's superior performance over GD in high-dimensional non-convex optimization problems.
Findings
SGD outperforms GD for small batch sizes
Recovery threshold for SGD is lower than GD
Power law fit shows faster relaxation times for SGD
Abstract
Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used extensively to train artificial neural networks. However very little is known on to what extent SGD is crucial for to the success of this technology and, in particular, how much it is effective in optimizing high-dimensional non-convex cost functions as compared to other optimization algorithms such as Gradient Descent (GD). In this work we leverage dynamical mean field theory to benchmark its performances in the high-dimensional limit. To do that, we consider the problem of recovering a hidden high-dimensional non-linearly encrypted signal, a prototype high-dimensional non-convex hard optimization problem. We compare the performances of SGD to GD and we show that SGD largely outperforms GD for sufficiently small batch sizes. In particular, a power law fit of the relaxation time of these algorithms shows that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Privacy-Preserving Technologies in Data
MethodsStochastic Gradient Descent
