The Neural Tangent Kernel for Classification
Jonathan Plenk, Sergio Calvo-Ordonez, Alvaro Cartea, Yarin Gal, Mark van der Wilk, Kamil Ciosek

TL;DR
This paper extends the Neural Tangent Kernel (NTK) theory to classification tasks, showing conditions under which wide neural networks behave linearly during training and providing insights into model uncertainty.
Contribution
It identifies conditions for NTK constancy in classification, including regularization and target properties, and characterizes the training solution explicitly.
Findings
NTK remains approximately constant during classification training under certain conditions.
Parameter-space regularization ensures a stable NTK with cross-entropy loss.
The distribution of trained predictors relates to Bayesian model uncertainty.
Abstract
In wide neural networks, the Neural Tangent Kernel (NTK) remains approximately constant during training, providing a powerful theoretical tool for studying training dynamics, generalization, and connections to kernel methods. However, this theory is largely restricted to regression losses. It was previously thought that training on a classification loss, or more generally losses involving nonlinear output transformations, breaks this property, leading to divergent logits and a breakdown of the linearization. In this paper, we extend NTK theory to classification by identifying conditions under which wide neural networks remain in the lazy training regime. We show that parameter-space regularization ensures a constant NTK during training for cross-entropy loss, while in the absence of regularization the regime is recovered when targets are non-degenerate, i.e. when all classes have…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
