On the Generalization Power of the Overfitted Three-Layer Neural Tangent   Kernel Model

Peizhong Ju; Xiaojun Lin; Ness B. Shroff

arXiv:2206.02047·cs.LG·June 7, 2022·1 cites

On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model

Peizhong Ju, Xiaojun Lin, Ness B. Shroff

PDF

Open Access 1 Video

TL;DR

This paper analyzes the generalization capabilities of overparameterized 3-layer NTK models, revealing how test error decreases with network size and how the model's sensitivity to biases compares to 2-layer NTK models.

Contribution

It provides the first theoretical bounds on the generalization error of overfitted 3-layer NTK models and compares their bias sensitivity to 2-layer NTK models.

Findings

01

Test error decreases with the number of neurons in hidden layers.

02

Test error decreases faster with neurons in the second hidden-layer.

03

3-layer NTK is less sensitive to bias choices than 2-layer NTK.

Abstract

In this paper, we study the generalization performance of overparameterized 3-layer NTK models. We show that, for a specific set of ground-truth functions (which we refer to as the "learnable set"), the test error of the overfitted 3-layer NTK is upper bounded by an expression that decreases with the number of neurons of the two hidden layers. Different from 2-layer NTK where there exists only one hidden-layer, the 3-layer NTK involves interactions between two hidden-layers. Our upper bound reveals that, between the two hidden-layers, the test error descends faster with respect to the number of neurons in the second hidden-layer (the one closer to the output) than with respect to that in the first hidden-layer (the one closer to the input). We also show that the learnable set of 3-layer NTK without bias is no smaller than that of 2-layer NTK models with various choices of bias in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the Generalization Power of the Overfitted Three-Layer Neural Tangent Kernel Model· slideslive

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning

MethodsNeural Tangent Kernel