The Spectral Dimension of NTKs is Constant: A Theory of Implicit Regularization, Finite-Width Stability, and Scalable Estimation
Praveen Anilkumar Shukla

TL;DR
This paper demonstrates that the spectral dimension of Neural Tangent Kernels remains constant at initialization, providing insights into implicit regularization, stability, and scalable estimation in overparameterized deep networks.
Contribution
It proves a constant-limit law for the effective rank of NTK Gram matrices at initialization and develops a scalable estimator for this quantity.
Findings
Effective rank of NTK remains constant as network width increases.
Finite-width NTK deviations are controlled and predictable.
Empirical results on CIFAR-10 align with theoretical predictions.
Abstract
Modern deep networks are heavily overparameterized yet often generalize well, suggesting a form of low intrinsic complexity not reflected by parameter counts. We study this complexity at initialization through the effective rank of the Neural Tangent Kernel (NTK) Gram matrix, . For i.i.d. data and the infinite-width NTK , we prove a constant-limit law , with sub-Gaussian concentration. We further establish finite-width stability: if the finite-width NTK deviates in operator norm by (width ), then changes by . We design a scalable estimator using random output probes and a CountSketch of parameter Jacobians and prove conditional unbiasedness and consistency with explicit…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Markov Chains and Monte Carlo Methods
