On the rate of convergence of a neural network regression estimate learned by gradient descent
Alina Braun, Michael Kohler, Harro Walk

TL;DR
This paper analyzes the convergence rate of neural network regression estimates trained with gradient descent, showing they nearly achieve optimal rates in a nonparametric setting with practical implementation via multiple random starts.
Contribution
It introduces a method combining multiple random initializations and gradient descent to attain near-optimal convergence rates in neural network regression.
Findings
Achieves near-optimal convergence rates up to a logarithmic factor.
Demonstrates effectiveness through simulated data experiments.
Provides theoretical guarantees for the proposed estimation procedure.
Abstract
Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient descent procedure is repeated several times with randomly chosen starting values for the weights, and from the list of constructed estimates the one with the minimal empirical risk is chosen. Under the assumption that the number of randomly chosen starting values and the number of steps for gradient descent are sufficiently large it is shown that the resulting estimate achieves (up to a logarithmic factor) the optimal rate of convergence in a projection pursuit model. The final sample size performance of the estimates is illustrated by using simulated data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning
