The Global Empirical NTK: Self-Referential Bias and Dimensionality of Gradient Descent Learning
James Hazelden, Laura Driscoll, Eli Shlizerman, Eric Shea-Brown

TL;DR
This paper introduces the Global Empirical Neural Tangent Kernel (NTK), revealing its structure, bias, and low-rank nature in various models, explaining gradient descent learning dynamics and proposing a computational library.
Contribution
It derives a tractable, exact form of the NTK for diverse models, uncovering its structural bottlenecks and bias, and provides a practical library for NTK computations.
Findings
NTK is structurally bottlenecked, constraining its effective rank.
Gradient descent preferentially learns within dominant modes of activity.
Model dynamics at initialization bias the NTK, limiting learning effectiveness.
Abstract
In training a neural network with gradient descent (GD), each iteration induces a linear operator that governs first-order updates to a model's internal state variables. We define this operator as the Global Empirical Neural Tangent Kernel (NTK). In finite-width networks, the NTK is typically intractable to form, leading prior work to focus on restrictive settings such as tracking outputs only or taking infinite-width limits. Here, we study the structure of the NTK for a range of models. Formulating the model state as the solution to a single global implicit constraint, we derive the NTK as a product of two operators: K, accounting for immediate parameter-to-state interactions, and P, describing internal state-to-state dependencies. For a broad class of weight-based models, including RNNs and transformers, we prove a universal Kronecker-core theorem showing that K admits an exact,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
