On the Sample Complexity of Two-Layer Networks: Lipschitz vs. Element-Wise Lipschitz Activation
Amit Daniely, Elad Granot

TL;DR
This paper analyzes the sample complexity of two-layer neural networks with Lipschitz activation functions, showing that element-wise activations lead to logarithmic width dependency, unlike non-element-wise functions.
Contribution
It establishes the importance of element-wise activation functions for achieving logarithmic sample complexity dependency on network width.
Findings
Logarithmic sample complexity for element-wise activations
Linear width dependency for certain non-element-wise activations
Development of new techniques using Approximate Description Length (ADL)
Abstract
We investigate the sample complexity of bounded two-layer neural networks using different activation functions. In particular, we consider the class where the spectral norm of and is bounded by , the Frobenius norm of is bounded from its initialization by , and is a Lipschitz activation function. We prove that if is element-wise, then the sample complexity of has only logarithmic dependency in width and that this complexity is tight, up to logarithmic factors. We further show that the element-wise property of is essential for a logarithmic dependency bound in width, in the sense…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Memory and Neural Computing · Ferroelectric and Negative Capacitance Devices · Machine Learning in Materials Science
