Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit
Konstantin Riedl, Konstantinos Spiliopoulos, Justin Sirignano

TL;DR
This paper analyzes the convergence of regularized Newton's method for training overparameterized neural networks, showing exponential convergence and spectral bias mitigation compared to gradient descent.
Contribution
It introduces a convergence analysis in the overparameterized limit, involving a new Newton neural tangent kernel and explicit rates, addressing spectral bias and Hessian indefiniteness.
Findings
Neural networks converge exponentially fast to the target in the infinite-width limit.
Regularized NNTK eigenvalues are bounded away from zero, enabling faster convergence for high-frequency data.
The regularization parameter can be scaled to vanish as network width increases, maintaining positive definiteness.
Abstract
A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
