Learning Deep ReLU Networks Is Fixed-Parameter Tractable
Sitan Chen, Adam R. Klivans, Raghu Meka

TL;DR
This paper presents a fixed-parameter tractable algorithm for learning deep ReLU networks with Gaussian inputs, overcoming limitations of gradient-based methods and prior exponential-time algorithms, especially for networks with more than two layers.
Contribution
The authors introduce a novel algorithm that efficiently learns deep ReLU networks without requiring weight conditioning or positive coefficients, using filtered PCA and tropical geometry techniques.
Findings
Algorithm runs in polynomial time relative to input dimension and network parameters.
First nontrivial results for networks of depth greater than two.
Demonstrates limitations of gradient-based learning methods for certain neural network classes.
Abstract
We consider the problem of learning an unknown ReLU network with respect to Gaussian inputs and obtain the first nontrivial results for networks of depth more than two. We give an algorithm whose running time is a fixed polynomial in the ambient dimension and some (exponentially large) function of only the network's parameters. Our bounds depend on the number of hidden units, depth, spectral norm of the weight matrices, and Lipschitz constant of the overall network (we show that some dependence on the Lipschitz constant is necessary). We also give a bound that is doubly exponential in the size of the network but is independent of spectral norm. These results provably cannot be obtained using gradient-based methods and give the first example of a class of efficiently learnable neural networks that gradient descent will fail to learn. In contrast, prior work for learning networks of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Learning Deep ReLU Networks is Fixed-Parameter Tractable· youtube
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Neural Networks and Applications
MethodsPrincipal Components Analysis · *Communicated@Fast*How Do I Communicate to Expedia?
