Convergence of constant step stochastic gradient descent for non-smooth   non-convex functions

Pascal Bianchi (S2A; IDS; IP Paris); Walid Hachem (LIGM); Sholom; Schechtman (LIGM)

arXiv:2005.08513·math.NA·April 13, 2022

Convergence of constant step stochastic gradient descent for non-smooth non-convex functions

Pascal Bianchi (S2A, IDS, IP Paris), Walid Hachem (LIGM), Sholom, Schechtman (LIGM)

PDF

Open Access

TL;DR

This paper analyzes the long-term behavior of constant step stochastic gradient descent for non-smooth, non-convex functions, showing convergence to critical points without requiring an oracle for the Clarke subdifferential.

Contribution

It proves that no oracle is needed for convergence, and establishes probabilistic convergence of the algorithm's trajectory to the set of solutions of a differential inclusion.

Findings

01

Convergence of the interpolated trajectory to the differential inclusion solutions.

02

Invariant distributions of the Markov chain converge as step size decreases.

03

Algorithm's iterates tend to critical points of the mean function.

Abstract

This paper studies the asymptotic behavior of the constant step Stochastic Gradient Descent for the minimization of an unknown function F , defined as the expectation of a non convex, non smooth, locally Lipschitz random function. As the gradient may not exist, it is replaced by a certain operator: a reasonable choice is to use an element of the Clarke subdifferential of the random function; an other choice is the output of the celebrated backpropagation algorithm, which is popular amongst practionners, and whose properties have recently been studied by Bolte and Pauwels [7]. Since the expectation of the chosen operator is not in general an element of the Clarke subdifferential BF of the mean function, it has been assumed in the literature that an oracle of BF is available. As a first result, it is shown in this paper that such an oracle is not needed for almost all initialization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Extracellular vesicles in disease · Sparse and Compressive Sensing Techniques