On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model
Peizhong Ju, Xiaojun Lin, Ness B. Shroff

TL;DR
This paper analyzes the generalization capabilities of overparameterized 3-layer NTK models, revealing how test error decreases with network size and how the model's sensitivity to biases compares to 2-layer NTK models.
Contribution
It provides the first theoretical bounds on the generalization error of overfitted 3-layer NTK models and compares their bias sensitivity to 2-layer NTK models.
Findings
Test error decreases with the number of neurons in hidden layers.
Test error decreases faster with neurons in the second hidden-layer.
3-layer NTK is less sensitive to bias choices than 2-layer NTK.
Abstract
In this paper, we study the generalization performance of overparameterized 3-layer NTK models. We show that, for a specific set of ground-truth functions (which we refer to as the "learnable set"), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK where there exists only one hidden-layer, the 3-layer NTK involves interactions between two hidden-layers. Our upper bound reveals that, between the two hidden-layers, the test error descends faster with respect to the number of neurons in the second hidden-layer (the one closer to the output) than with respect to that in the first hidden-layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
MethodsNeural Tangent Kernel
