TL;DR
This paper introduces the Weighted Neural Tangent Kernel (WNTK), a generalized kernel that better captures neural network training dynamics under various optimizers, improving upon the original NTK in theory and practice.
Contribution
The paper proposes the WNTK, extending NTK to incorporate different optimizers, with theoretical stability proofs and empirical improvements over NTK.
Findings
WNTK is stable at initialization and during training.
WNTK matches neural network estimators with different learning rates.
Empirical results show WNTK outperforms NTK in experiments.
Abstract
The Neural Tangent Kernel (NTK) has recently attracted intense study, as it describes the evolution of an over-parameterized Neural Network (NN) trained by gradient descent. However, it is now well-known that gradient descent is not always a good optimizer for NNs, which can partially explain the unsatisfactory practical performance of the NTK regression estimator. In this paper, we introduce the Weighted Neural Tangent Kernel (WNTK), a generalized and improved tool, which can capture an over-parameterized NN's training dynamics under different optimizers. Theoretically, in the infinite-width limit, we prove: i) the stability of the WNTK at initialization and during training, and ii) the equivalence between the WNTK regression estimator and the corresponding NN estimator with different learning rates on different parameters. With the proposed weight update algorithm, both empirical and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsNeural Tangent Kernel
