Efficiently Learning One-Hidden-Layer ReLU Networks via Schur Polynomials
Ilias Diakonikolas, Daniel M. Kane

TL;DR
This paper presents an efficient algorithm for PAC learning one-hidden-layer ReLU networks with Gaussian inputs, leveraging Schur polynomials and tensor decomposition to achieve near-optimal complexity within a certain class.
Contribution
The paper introduces a novel algorithm with significantly improved complexity for learning ReLU networks, utilizing Schur polynomial theory and tensor methods.
Findings
Algorithm achieves complexity $(dk/)^{O(k)}$, improving over previous super-polynomial bounds.
Uses tensor decomposition to identify subspaces with small higher-order moments.
Analysis shows higher-moment tensors are small when lower-order moments are controlled.
Abstract
We study the problem of PAC learning a linear combination of ReLU activations under the standard Gaussian distribution on with respect to the square loss. Our main result is an efficient algorithm for this learning task with sample and computational complexity , where is the target accuracy. Prior work had given an algorithm for this problem with complexity , where the function scales super-polynomially in . Interestingly, the complexity of our algorithm is near-optimal within the class of Correlational Statistical Query algorithms. At a high-level, our algorithm uses tensor decomposition to identify a subspace such that all the -order moments are small in the orthogonal directions. Its analysis makes essential use of the theory of Schur polynomials to show that the higher-moment error tensors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Modeling and Causal Inference · Statistical Methods and Bayesian Inference · Random Matrices and Applications
