On the Impacts of the Random Initialization in the Neural Tangent Kernel Theory
Guhan Chen, Yicheng Li, Qian Lin

TL;DR
This paper investigates how random initialization affects the neural tangent kernel (NTK) theory, revealing that traditional and mirrored initializations lead to different generalization behaviors and that NTK theory alone may not fully explain neural network performance.
Contribution
It demonstrates the convergence of neural network training dynamics to NTK regression and analyzes the function space of Gaussian process limits, highlighting limitations of NTK theory.
Findings
Training dynamics converge to NTK regression with random initialization.
The Gaussian process lies in a specific interpolation space of the NTK RKHS.
Wide neural networks suffer from the curse of dimensionality in generalization.
Abstract
This paper aims to discuss the impact of random initialization of neural networks in the neural tangent kernel (NTK) theory, which is ignored by most recent works in the NTK theory. It is well known that as the network's width tends to infinity, the neural network with random initialization converges to a Gaussian process , which takes values in , where is the domain of the data. In contrast, to adopt the traditional theory of kernel regression, most recent works introduced a special mirrored architecture and a mirrored (random) initialization to ensure the network's output is identically zero at initialization. Therefore, it remains a question whether the conventional setting and mirrored initialization would make wide neural networks exhibit different generalization capabilities. In this paper, we first show that the training dynamics…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
MethodsNeural Tangent Kernel · Gaussian Process
