Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

Konstantin Riedl; Konstantinos Spiliopoulos; Justin Sirignano

arXiv:2605.08352·cs.LG·May 21, 2026

Convergence Analysis of Newton's Method for Neural Networks in the Overparameterized Limit

Konstantin Riedl, Konstantinos Spiliopoulos, Justin Sirignano

PDF

TL;DR

This paper analyzes the convergence of regularized Newton's method for training overparameterized neural networks, showing exponential convergence and spectral bias mitigation compared to gradient descent.

Contribution

It introduces a convergence analysis in the overparameterized limit, involving a new Newton neural tangent kernel and explicit rates, addressing spectral bias and Hessian indefiniteness.

Findings

01

Neural networks converge exponentially fast to the target in the infinite-width limit.

02

Regularized NNTK eigenvalues are bounded away from zero, enabling faster convergence for high-frequency data.

03

The regularization parameter can be scaled to vanish as network width increases, maintaining positive definiteness.

Abstract

A convergence analysis is developed for the regularized Newton method for training neural networks (NNs) in the overparameterized limit. As the number of hidden units tends to infinity, the NN training dynamics converge in probability to the solution of a deterministic limit equation involving a ``Newton neural tangent kernel'' (NNTK). Explicit rates characterizing this convergence are provided and, in the infinite-width limit, we prove that the NN converges exponentially fast to the target data (i.e., a global minimizer with zero loss). We show that this convergence is uniform across the frequency spectrum, addressing the spectral bias inherent in gradient descent. The eigenvalues of the NTK for gradient descent accumulate at zero, leading to slow convergence for target data with high-frequency components. In contrast, the NNTK has uniformly lower bounded eigenvalues if the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.