Spurious Local Minima of Deep ReLU Neural Networks in the Neural Tangent Kernel Regime
Tohru Nitta

TL;DR
This paper proves that deep ReLU neural networks trained under the Neural Tangent Kernel regime do not have spurious local minima, ensuring more reliable convergence during gradient descent in the infinite-width limit.
Contribution
It provides a theoretical proof that deep ReLU networks in the NTK regime lack spurious local minima, clarifying the loss landscape in the infinite-width limit.
Findings
No spurious local minima in the NTK regime for deep ReLU networks.
Gradient descent converges reliably in the infinite-width limit.
Theoretical validation of the NTK regime's optimization landscape.
Abstract
In this paper, we theoretically prove that the deep ReLU neural networks do not lie in spurious local minima in the loss landscape under the Neural Tangent Kernel (NTK) regime, that is, in the gradient descent training dynamics of the deep ReLU neural networks whose parameters are initialized by a normal distribution in the limit as the widths of the hidden layers tend to infinity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Model Reduction and Neural Networks
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Kaiming Initialization
