Polylogarithmic width suffices for gradient descent to achieve arbitrarily small test error with shallow ReLU networks
Ziwei Ji, Matus Telgarsky

TL;DR
This paper demonstrates that shallow ReLU networks with polylogarithmic width can achieve arbitrarily small test error using gradient descent, significantly reducing the width requirements compared to previous polynomial bounds.
Contribution
It proves that polylogarithmic width suffices for gradient descent to reach low test error, improving over prior polynomial width bounds for shallow ReLU networks.
Findings
Polylogarithmic width is sufficient for gradient descent to achieve small test error.
Gradient descent with polylogarithmic width and polynomial samples converges to low test error.
Analysis based on the separation margin of the limiting kernel provides tight sample complexity bounds.
Abstract
Recent theoretical work has guaranteed that overparameterized networks trained by gradient descent achieve arbitrarily low training error, and sometimes even low test error. The required width, however, is always polynomial in at least one of the sample size , the (inverse) target error , and the (inverse) failure probability . This work shows that iterations of gradient descent with training examples on two-layer ReLU networks of any width exceeding suffice to achieve a test misclassification error of . We also prove that stochastic gradient descent can achieve test error with polylogarithmic width and samples. The analysis relies upon the separation margin of the limiting kernel, which is guaranteed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM
Methods*Communicated@Fast*How Do I Communicate to Expedia?
