Efficient Algorithms for Learning Depth-2 Neural Networks with General   ReLU Activations

Pranjal Awasthi; Alex Tang; Aravindan Vijayaraghavan

arXiv:2107.10209·cs.LG·August 3, 2021·1 cites

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations

Pranjal Awasthi, Alex Tang, Aravindan Vijayaraghavan

PDF

Open Access 1 Video

TL;DR

This paper introduces polynomial time algorithms for efficiently learning depth-2 neural networks with ReLU activations, including bias terms, by tensor decomposition and Hermite expansion techniques.

Contribution

It provides the first polynomial time algorithms for learning ReLU networks with biases, using tensor decomposition and Hermite expansion methods.

Findings

01

Algorithms are polynomial time and sample efficient.

02

Successfully handle networks with bias terms.

03

Establish identifiability of network parameters.

Abstract

We present polynomial time and sample efficient algorithms for learning an unknown depth-2 feedforward neural network with general ReLU activations, under mild non-degeneracy assumptions. In particular, we consider learning an unknown network of the form $f (x) = a^{T} σ (W^{T} x + b)$ , where $x$ is drawn from the Gaussian distribution, and $σ (t) := max (t, 0)$ is the ReLU activation. Prior works for learning networks with ReLU activations assume that the bias $b$ is zero. In order to deal with the presence of the bias terms, our proposed algorithm consists of robustly decomposing multiple higher order tensors arising from the Hermite expansion of the function $f (x)$ . Using these ideas we also establish identifiability of the network parameters under minimal assumptions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Efficient Algorithms for Learning Depth-2 Neural Networks with General ReLU Activations· slideslive

Taxonomy

TopicsTensor decomposition and applications · Machine Learning and ELM · Model Reduction and Neural Networks