On the Disconnect Between Theory and Practice of Neural Networks: Limits of the NTK Perspective
Jonathan Wenger, Felix Dangel, Agustinus Kristiadi

TL;DR
This paper investigates the practical relevance of the neural tangent kernel (NTK) theory for large neural networks, finding that the theoretical benefits do not manifest in real-world architectures, thus questioning its practical utility.
Contribution
The paper empirically examines the disconnect between NTK theory predictions and actual neural network behavior, highlighting limitations of NTK in practical settings.
Findings
NTK theory does not accurately predict behavior of large-width neural networks in practice.
Architectures used in real-world applications are not sufficiently wide to exhibit NTK-predicted benefits.
The results challenge the applicability of NTK-based insights for neural network design.
Abstract
The neural tangent kernel (NTK) has garnered significant attention as a theoretical framework for describing the behavior of large-scale neural networks. Kernel methods are theoretically well-understood and as a result enjoy algorithmic benefits, which can be demonstrated to hold in wide synthetic neural network architectures. These advantages include faster optimization, reliable uncertainty quantification and improved continual learning. However, current results quantifying the rate of convergence to the kernel regime suggest that exploiting these benefits requires architectures that are orders of magnitude wider than they are deep. This assumption raises concerns that architectures used in practice do not exhibit behaviors as predicted by the NTK. Here, we supplement previous work on the NTK by empirically investigating whether the limiting regime predicts practically relevant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Stochastic Gradient Optimization Techniques
MethodsNeural Tangent Kernel
