The Neural Tangent Kernel for Classification

Jonathan Plenk; Sergio Calvo-Ordonez; Alvaro Cartea; Yarin Gal; Mark van der Wilk; Kamil Ciosek

arXiv:2605.17606·cs.LG·May 19, 2026

The Neural Tangent Kernel for Classification

Jonathan Plenk, Sergio Calvo-Ordonez, Alvaro Cartea, Yarin Gal, Mark van der Wilk, Kamil Ciosek

PDF

TL;DR

This paper extends the Neural Tangent Kernel (NTK) theory to classification tasks, showing conditions under which wide neural networks behave linearly during training and providing insights into model uncertainty.

Contribution

It identifies conditions for NTK constancy in classification, including regularization and target properties, and characterizes the training solution explicitly.

Findings

01

NTK remains approximately constant during classification training under certain conditions.

02

Parameter-space regularization ensures a stable NTK with cross-entropy loss.

03

The distribution of trained predictors relates to Bayesian model uncertainty.

Abstract

In wide neural networks, the Neural Tangent Kernel (NTK) remains approximately constant during training, providing a powerful theoretical tool for studying training dynamics, generalization, and connections to kernel methods. However, this theory is largely restricted to regression losses. It was previously thought that training on a classification loss, or more generally losses involving nonlinear output transformations, breaks this property, leading to divergent logits and a breakdown of the linearization. In this paper, we extend NTK theory to classification by identifying conditions under which wide neural networks remain in the lazy training regime. We show that parameter-space regularization ensures a constant NTK during training for cross-entropy loss, while in the absence of regularization the regime is recovered when targets are non-degenerate, i.e. when all classes have…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.