The Sample Complexity of One-Hidden-Layer Neural Networks
Gal Vardi, Ohad Shamir, Nathan Srebro

TL;DR
This paper investigates the sample complexity and uniform convergence bounds for one-hidden-layer neural networks, highlighting the importance of norm constraints and identifying specific conditions where spectral norm control suffices.
Contribution
It demonstrates that Frobenius norm control is generally necessary for uniform convergence, but spectral norm control can suffice in certain smooth or convolutional network settings.
Findings
Frobenius norm control ensures uniform convergence regardless of network width.
Spectral norm control is sufficient for smooth activation functions and some convolutional networks.
Sample complexity depends on parameters like patch overlap and number of patches in convolutional networks.
Abstract
We study norm-based uniform convergence bounds for neural networks, aiming at a tight understanding of how these are affected by the architecture and type of norm constraint, for the simple class of scalar-valued one-hidden-layer networks, and inputs bounded in Euclidean norm. We begin by proving that in general, controlling the spectral norm of the hidden layer weight matrix is insufficient to get uniform convergence guarantees (independent of the network width), while a stronger Frobenius norm control is sufficient, extending and improving on previous work. Motivated by the proof constructions, we identify and analyze two important settings where (perhaps surprisingly) a mere spectral norm control turns out to be sufficient: First, when the network's activation functions are sufficiently smooth (with the result extending to deeper networks); and second, for certain types of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Advanced Memory and Neural Computing
