The Spectral Dimension of NTKs is Constant: A Theory of Implicit Regularization, Finite-Width Stability, and Scalable Estimation

Praveen Anilkumar Shukla

arXiv:2512.00860·cs.LG·December 2, 2025

The Spectral Dimension of NTKs is Constant: A Theory of Implicit Regularization, Finite-Width Stability, and Scalable Estimation

Praveen Anilkumar Shukla

PDF

Open Access

TL;DR

This paper demonstrates that the spectral dimension of Neural Tangent Kernels remains constant at initialization, providing insights into implicit regularization, stability, and scalable estimation in overparameterized deep networks.

Contribution

It proves a constant-limit law for the effective rank of NTK Gram matrices at initialization and develops a scalable estimator for this quantity.

Findings

01

Effective rank of NTK remains constant as network width increases.

02

Finite-width NTK deviations are controlled and predictable.

03

Empirical results on CIFAR-10 align with theoretical predictions.

Abstract

Modern deep networks are heavily overparameterized yet often generalize well, suggesting a form of low intrinsic complexity not reflected by parameter counts. We study this complexity at initialization through the effective rank of the Neural Tangent Kernel (NTK) Gram matrix, $r_{eff} (K) = (tr (K))^{2} /∥ K ∥_{F}^{2}$ . For i.i.d. data and the infinite-width NTK $k$ , we prove a constant-limit law $lim_{n \to \infty} E [r_{eff} (K_{n})] = E [k (x, x)]^{2} / E [k (x, x^{'})^{2}] =: r_{\infty}$ , with sub-Gaussian concentration. We further establish finite-width stability: if the finite-width NTK deviates in operator norm by $O_{p} (m^{- 1/2})$ (width $m$ ), then $r_{eff}$ changes by $O_{p} (m^{- 1/2})$ . We design a scalable estimator using random output probes and a CountSketch of parameter Jacobians and prove conditional unbiasedness and consistency with explicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods