On the rate of convergence of a neural network regression estimate   learned by gradient descent

Alina Braun; Michael Kohler; Harro Walk

arXiv:1912.03921·math.ST·December 10, 2019·6 cites

On the rate of convergence of a neural network regression estimate learned by gradient descent

Alina Braun, Michael Kohler, Harro Walk

PDF

Open Access

TL;DR

This paper analyzes the convergence rate of neural network regression estimates trained with gradient descent, showing they nearly achieve optimal rates in a nonparametric setting with practical implementation via multiple random starts.

Contribution

It introduces a method combining multiple random initializations and gradient descent to attain near-optimal convergence rates in neural network regression.

Findings

01

Achieves near-optimal convergence rates up to a logarithmic factor.

02

Demonstrates effectiveness through simulated data experiments.

03

Provides theoretical guarantees for the proposed estimation procedure.

Abstract

Nonparametric regression with random design is considered. Estimates are defined by minimzing a penalized empirical $L_{2}$ risk over a suitably chosen class of neural networks with one hidden layer via gradient descent. Here, the gradient descent procedure is repeated several times with randomly chosen starting values for the weights, and from the list of constructed estimates the one with the minimal empirical $L_{2}$ risk is chosen. Under the assumption that the number of randomly chosen starting values and the number of steps for gradient descent are sufficiently large it is shown that the resulting estimate achieves (up to a logarithmic factor) the optimal rate of convergence in a projection pursuit model. The final sample size performance of the estimates is illustrated by using simulated data.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Stochastic Gradient Optimization Techniques · Domain Adaptation and Few-Shot Learning