Learning ReLU Networks on Linearly Separable Data: Algorithm,   Optimality, and Generalization

Gang Wang; Georgios B. Giannakis; Jie Chen

arXiv:1808.04685·stat.ML·May 1, 2019

Learning ReLU Networks on Linearly Separable Data: Algorithm, Optimality, and Generalization

Gang Wang, Georgios B. Giannakis, Jie Chen

PDF

TL;DR

This paper introduces a novel stochastic gradient descent algorithm for training single-hidden-layer ReLU networks on linearly separable data, achieving global optimality without assumptions on data distribution or network size, and provides generalization guarantees.

Contribution

It presents the first provably globally optimal SGD algorithm for ReLU networks on linearly separable data without distribution or size assumptions.

Findings

01

Algorithm converges to global minimum despite non-convexity.

02

No assumptions on data distribution, network size, or initialization.

03

Provides generalization bounds based on compression arguments.

Abstract

Neural networks with REctified Linear Unit (ReLU) activation functions (a.k.a. ReLU networks) have achieved great empirical success in various domains. Nonetheless, existing results for learning ReLU networks either pose assumptions on the underlying data distribution being e.g. Gaussian, or require the network size and/or training size to be sufficiently large. In this context, the problem of learning a two-layer ReLU network is approached in a binary classification setting, where the data are linearly separable and a hinge loss criterion is adopted. Leveraging the power of random noise perturbation, this paper presents a novel stochastic gradient descent (SGD) algorithm, which can \emph{provably} train any single-hidden-layer ReLU network to attain global optimality, despite the presence of infinitely many bad local minima, maxima, and saddle points in general. This result is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methods*Communicated@Fast*How Do I Communicate to Expedia? · HuMan(Expedia)||How do I get a human at Expedia?