Spurious Local Minima of Deep ReLU Neural Networks in the Neural Tangent   Kernel Regime

Tohru Nitta

arXiv:1806.04884·stat.ML·May 20, 2022

Spurious Local Minima of Deep ReLU Neural Networks in the Neural Tangent Kernel Regime

Tohru Nitta

PDF

Open Access

TL;DR

This paper proves that deep ReLU neural networks trained under the Neural Tangent Kernel regime do not have spurious local minima, ensuring more reliable convergence during gradient descent in the infinite-width limit.

Contribution

It provides a theoretical proof that deep ReLU networks in the NTK regime lack spurious local minima, clarifying the loss landscape in the infinite-width limit.

Findings

01

No spurious local minima in the NTK regime for deep ReLU networks.

02

Gradient descent converges reliably in the infinite-width limit.

03

Theoretical validation of the NTK regime's optimization landscape.

Abstract

In this paper, we theoretically prove that the deep ReLU neural networks do not lie in spurious local minima in the loss landscape under the Neural Tangent Kernel (NTK) regime, that is, in the gradient descent training dynamics of the deep ReLU neural networks whose parameters are initialized by a normal distribution in the limit as the widths of the hidden layers tend to infinity.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks

Methods*Communicated@Fast*How Do I Communicate to Expedia? · Kaiming Initialization