What Can ResNet Learn Efficiently, Going Beyond Kernels?
Zeyuan Allen-Zhu, Yuanzhi Li

TL;DR
This paper demonstrates that neural networks, specifically ResNet, can efficiently learn certain functions with better test error than kernel methods, due to their hierarchical learning capabilities, without distributional assumptions.
Contribution
It proves neural networks can learn a class of functions more effectively than kernels in a distribution-free setting, highlighting hierarchical learning as a key advantage.
Findings
Neural networks can learn certain functions with much smaller test error than kernels.
Hierarchical learning reduces sample complexity compared to kernel methods.
ResNet has a computational complexity advantage over other learning methods.
Abstract
How can neural networks such as ResNet efficiently learn CIFAR-10 with test accuracy more than 96%, while other methods, especially kernel methods, fall relatively behind? Can we more provide theoretical justifications for this gap? Recently, there is an influential line of work relating neural networks to kernels in the over-parameterized regime, proving they can learn certain concept class that is also learnable by kernels with similar test error. Yet, can neural networks provably learn some concept class BETTER than kernels? We answer this positively in the distribution-free setting. We prove neural networks can efficiently learn a notable class of functions, including those defined by three-layer residual networks with smooth activations, without any distributional assumption. At the same time, we prove there are simple functions in this class such that with the same number of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications
MethodsAverage Pooling · *Communicated@Fast*How Do I Communicate to Expedia? · 1x1 Convolution · Batch Normalization · Bottleneck Residual Block · Global Average Pooling · Residual Block · Kaiming Initialization · Max Pooling · Residual Connection
