Stochasticity helps to navigate rough landscapes: comparing   gradient-descent-based algorithms in the phase retrieval problem

Francesca Mignacco; Pierfrancesco Urbani; Lenka Zdeborov\'a

arXiv:2103.04902·cond-mat.dis-nn·March 22, 2022

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborov\'a

PDF

TL;DR

This paper compares various gradient-based algorithms in the non-convex phase retrieval problem, showing stochastic methods often outperform deterministic gradient descent in reaching better generalization with limited samples.

Contribution

It provides an analytical comparison of gradient descent, stochastic gradient descent, and Langevin algorithms in navigating complex landscapes, highlighting the advantages of stochasticity.

Findings

01

Stochastic algorithms reach perfect generalization where gradient descent fails.

02

Gradient descent can achieve better generalization from less informed initializations.

03

Analytical trajectories of algorithms are characterized using dynamical mean-field theory.

Abstract

In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity. We consider the loss landscape of the high-dimensional phase retrieval problem as a prototypical highly non-convex example. We observe that for phase retrieval the stochastic variants of gradient descent are able to reach perfect generalization for regions of control parameters where the gradient descent algorithm is not. We apply dynamical mean-field theory from statistical physics to characterize analytically the full trajectories of these algorithms in their continuous-time limit, with a warm start, and for large system sizes. We further unveil several intriguing properties of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.