Learning One-hidden-layer ReLU Networks via Gradient Descent

Xiao Zhang; Yaodong Yu; Lingxiao Wang; Quanquan Gu

arXiv:1806.07808·stat.ML·June 21, 2018·51 cites

Learning One-hidden-layer ReLU Networks via Gradient Descent

Xiao Zhang, Yaodong Yu, Lingxiao Wang, Quanquan Gu

PDF

Open Access

TL;DR

This paper proves that tensor initialization followed by gradient descent can efficiently learn one-hidden-layer ReLU neural networks with multiple neurons, providing the first theoretical guarantees for such practical learning scenarios.

Contribution

It offers the first theoretical analysis of gradient descent convergence for learning multi-neuron ReLU networks with empirical risk minimization.

Findings

01

Tensor initialization plus gradient descent converges linearly to true parameters.

02

The method achieves recovery guarantees with statistical error bounds.

03

Numerical experiments confirm the theoretical results.

Abstract

We study the problem of learning one-hidden-layer neural networks with Rectified Linear Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian distribution and the outputs are generated from a noisy teacher network. We analyze the performance of gradient descent for training such kind of neural networks based on empirical risk minimization, and provide algorithm-dependent guarantees. In particular, we prove that tensor initialization followed by gradient descent can converge to the ground-truth parameters at a linear rate up to some statistical error. To the best of our knowledge, this is the first work characterizing the recovery guarantee for practical learning of one-hidden-layer ReLU networks with multiple neurons. Numerical experiments verify our theoretical findings.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and ELM · Advanced Neural Network Applications · Tensor decomposition and applications

Methods*Communicated@Fast*How Do I Communicate to Expedia?