Achieving Small Test Error in Mildly Overparameterized Neural Networks

Shiyu Liang; Ruoyu Sun; R. Srikant

arXiv:2104.11895·cs.LG·April 27, 2021·1 cites

Achieving Small Test Error in Mildly Overparameterized Neural Networks

Shiyu Liang, Ruoyu Sun, R. Srikant

PDF

Open Access

TL;DR

This paper demonstrates that for mildly over-parameterized neural networks, it is possible to find points with small test error efficiently, using polynomial-time algorithms, under certain conditions.

Contribution

The work introduces polynomial-time algorithms for finding low-test-error points in mildly over-parameterized neural networks, expanding understanding beyond large-width neural tangent kernel regimes.

Findings

01

Loss landscapes with regularization have all local minima with small test error.

02

Polynomial-time algorithms exist for convolutional neural nets to find low-error points.

03

Under certain data assumptions, fully connected nets also allow polynomial-time error minimization.

Abstract

Recent theoretical works on over-parameterized neural nets have focused on two aspects: optimization and generalization. Many existing works that study optimization and generalization together are based on neural tangent kernel and require a very large width. In this work, we are interested in the following question: for a binary classification problem with two-layer mildly over-parameterized ReLU network, can we find a point with small test error in polynomial time? We first show that the landscape of loss functions with explicit regularization has the following property: all local minima and certain other points which are only stationary in certain directions achieve small test error. We then prove that for convolutional neural nets, there is an algorithm which finds one of these points in polynomial time (in the input dimension and the number of data points). In addition, we prove…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Neural Network Applications · Machine Learning and ELM