Stochastic Gradient Descent outperforms Gradient Descent in recovering a   high-dimensional signal in a glassy energy landscape

Persia Jana Kamali; Pierfrancesco Urbani

arXiv:2309.04788·cs.LG·December 19, 2023·2 cites

Stochastic Gradient Descent outperforms Gradient Descent in recovering a high-dimensional signal in a glassy energy landscape

Persia Jana Kamali, Pierfrancesco Urbani

PDF

Open Access

TL;DR

This paper demonstrates through theoretical analysis that Stochastic Gradient Descent (SGD) outperforms Gradient Descent (GD) in recovering high-dimensional signals within complex energy landscapes, especially with small batch sizes.

Contribution

It provides a dynamical mean field theory benchmark showing SGD's superior performance over GD in high-dimensional non-convex optimization problems.

Findings

01

SGD outperforms GD for small batch sizes

02

Recovery threshold for SGD is lower than GD

03

Power law fit shows faster relaxation times for SGD

Abstract

Stochastic Gradient Descent (SGD) is an out-of-equilibrium algorithm used extensively to train artificial neural networks. However very little is known on to what extent SGD is crucial for to the success of this technology and, in particular, how much it is effective in optimizing high-dimensional non-convex cost functions as compared to other optimization algorithms such as Gradient Descent (GD). In this work we leverage dynamical mean field theory to benchmark its performances in the high-dimensional limit. To do that, we consider the problem of recovering a hidden high-dimensional non-linearly encrypted signal, a prototype high-dimensional non-convex hard optimization problem. We compare the performances of SGD to GD and we show that SGD largely outperforms GD for sufficiently small batch sizes. In particular, a power law fit of the relaxation time of these algorithms shows that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Markov Chains and Monte Carlo Methods · Privacy-Preserving Technologies in Data

MethodsStochastic Gradient Descent