Beyond Scaling Curves: Internal Dynamics of Neural Networks Through the NTK Lens
Konstantin Nikolaou, Sven Krippendorf, Samuel Tovey, Christian Holm

TL;DR
This paper investigates neural network scaling laws using the NTK framework, revealing that performance improvements can occur despite differing internal dynamics and identifying the width limit for effective feature learning.
Contribution
It provides empirical analysis linking performance scaling to internal dynamics via NTK and clarifies how model width influences feature learning and scaling regimes.
Findings
Similar performance scaling exponents can occur with opposite internal dynamics.
Feature learning diminishes as model width increases beyond a certain threshold.
Maximum effective model width for feature learning is significantly smaller than large language models.
Abstract
Scaling laws offer valuable insights into the relationship between neural network performance and computational cost, yet their underlying mechanisms remain poorly understood. In this work, we empirically analyze how neural networks behave under data and model scaling through the lens of the neural tangent kernel (NTK). This analysis establishes a link between performance scaling and the internal dynamics of neural networks. Our findings of standard vision tasks show that similar performance scaling exponents can occur even though the internal model dynamics show opposite behavior. This demonstrates that performance scaling alone is insufficient for understanding the underlying mechanisms of neural networks. We also address a previously unresolved issue in neural scaling: how convergence to the infinite-width limit affects scaling behavior in finite-width models. To this end, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
