Improved Overparametrization Bounds for Global Convergence of Stochastic   Gradient Descent for Shallow Neural Networks

Bart{\l}omiej Polaczyk; Jacek Cyranka

arXiv:2201.12052·cs.LG·November 17, 2022·1 cites

Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks

Bart{\l}omiej Polaczyk, Jacek Cyranka

PDF

Open Access

TL;DR

This paper improves the theoretical bounds on the overparametrization needed for stochastic gradient descent to globally converge in training shallow neural networks, including ReLU activations, using novel proof techniques.

Contribution

It introduces a new proof method combining nonlinear analysis and random initialization properties to tighten overparametrization bounds for convergence.

Findings

01

Established global convergence of continuous solutions for the neural network training dynamics.

02

Proved linear convergence of stochastic gradient descent towards zero loss.

03

Improved overparametrization bounds compared to previous results.

Abstract

We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions used in practice, including ReLU. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network. First, we establish the global convergence of continuous solutions of the differential inclusion being a nonsmooth analogue of the gradient flow for the MSE loss. Second, we provide a technical result (working also for general approximators) relating solutions of the aforementioned differential inclusion to the (discrete) stochastic gradient descent sequences, hence establishing linear convergence towards zero loss for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications