The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning

James Hazelden; Laura Driscoll; Eli Shlizerman; Eric Shea-Brown

arXiv:2605.08746·cs.LG·May 12, 2026

The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning

James Hazelden, Laura Driscoll, Eli Shlizerman, Eric Shea-Brown

PDF

TL;DR

This paper introduces the Global Empirical Neural Tangent Kernel (NTK), revealing its structure, bias, and low-rank nature in various models, explaining gradient descent learning dynamics and proposing a computational library.

Contribution

It derives a tractable, exact form of the NTK for diverse models, uncovering its structural bottlenecks and bias, and provides a practical library for NTK computations.

Findings

01

NTK is structurally bottlenecked, constraining its effective rank.

02

Gradient descent preferentially learns within dominant modes of activity.

03

Model dynamics at initialization bias the NTK, limiting learning effectiveness.

Abstract

In training a neural network with gradient descent (GD), each iteration induces a linear operator that governs first-order updates to a model's internal state variables. We define this operator as the Global Empirical Neural Tangent Kernel (NTK). In finite-width networks, the NTK is typically intractable to form, leading prior work to focus on restrictive settings such as tracking outputs only or taking infinite-width limits. Here, we study the structure of the NTK for a range of models. Formulating the model state as the solution to a single global implicit constraint, we derive the NTK as a product of two operators: K, accounting for immediate parameter-to-state interactions, and P, describing internal state-to-state dependencies. For a broad class of weight-based models, including RNNs and transformers, we prove a universal Kronecker-core theorem showing that K admits an exact,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.