TL;DR
This paper investigates how neural networks compress uninformative input directions during training, leading to improved learning efficiency and better alignment of the neural tangent kernel with label-relevant features.
Contribution
It introduces a geometric perspective on neural network compression of invariant manifolds and quantifies its impact on learning curves and kernel evolution.
Findings
Compression occurs in the feature learning regime, improving test error.
Lazy training shows no compression and slower learning curves.
Kernel eigenvectors become more label-aligned due to compression.
Abstract
We study how neural networks compress uninformative input space in models where data lie in dimensions, but whose label only vary within a linear manifold of dimension . We show that for a one-hidden layer network initialized with infinitesimal weights (i.e. in the feature learning regime) trained with gradient descent, the first layer of weights evolve to become nearly insensitive to the uninformative directions. These are effectively compressed by a factor , where is the size of the training set. We quantify the benefit of such a compression on the test error . For large initialization of the weights (the lazy training regime), no compression occurs and for regular boundaries separating labels we find that , with . Compression improves the learning…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsNeural Tangent Kernel
