Improved Overparametrization Bounds for Global Convergence of Stochastic Gradient Descent for Shallow Neural Networks
Bart{\l}omiej Polaczyk, Jacek Cyranka

TL;DR
This paper improves the theoretical bounds on the overparametrization needed for stochastic gradient descent to globally converge in training shallow neural networks, including ReLU activations, using novel proof techniques.
Contribution
It introduces a new proof method combining nonlinear analysis and random initialization properties to tighten overparametrization bounds for convergence.
Findings
Established global convergence of continuous solutions for the neural network training dynamics.
Proved linear convergence of stochastic gradient descent towards zero loss.
Improved overparametrization bounds compared to previous results.
Abstract
We study the overparametrization bounds required for the global convergence of stochastic gradient descent algorithm for a class of one hidden layer feed-forward neural networks, considering most of the activation functions used in practice, including ReLU. We improve the existing state-of-the-art results in terms of the required hidden layer width. We introduce a new proof technique combining nonlinear analysis with properties of random initializations of the network. First, we establish the global convergence of continuous solutions of the differential inclusion being a nonsmooth analogue of the gradient flow for the MSE loss. Second, we provide a technical result (working also for general approximators) relating solutions of the aforementioned differential inclusion to the (discrete) stochastic gradient descent sequences, hence establishing linear convergence towards zero loss for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Neural Networks and Applications
