Neural Tangent Kernel: Convergence and Generalization in Neural Networks
Arthur Jacot, Franck Gabriel, Cl\'ement Hongler

TL;DR
This paper introduces the Neural Tangent Kernel (NTK), a theoretical framework that describes the training dynamics and generalization of infinitely wide neural networks, showing they follow kernel gradient descent and converge under certain conditions.
Contribution
It proves the convergence and constancy of NTK during training in the infinite-width limit, linking neural network training to kernel methods and providing insights into generalization and early stopping.
Findings
NTK converges to a fixed kernel in the infinite-width limit.
Training dynamics follow a linear differential equation.
NTK's positive-definiteness is proven for data on the sphere with non-polynomial non-linearity.
Abstract
At initialization, artificial neural networks (ANNs) are equivalent to Gaussian processes in the infinite-width limit, thus connecting them to kernel methods. We prove that the evolution of an ANN during training can also be described by a kernel: during gradient descent on the parameters of an ANN, the network function (which maps input vectors to output vectors) follows the kernel gradient of the functional cost (which is convex, in contrast to the parameter cost) w.r.t. a new kernel: the Neural Tangent Kernel (NTK). This kernel is central to describe the generalization features of ANNs. While the NTK is random at initialization and varies during training, in the infinite-width limit it converges to an explicit limiting kernel and it stays constant during training. This makes it possible to study the training of ANNs in function space instead of parameter space. Convergence…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
MethodsNeural Tangent Kernel
