Distribution-Specific Hardness of Learning Neural Networks
Ohad Shamir

TL;DR
This paper investigates the conditions under which neural networks can be learned efficiently with gradient-based methods, showing that neither input distribution niceness nor target function simplicity alone guarantees learnability.
Contribution
It demonstrates that specific assumptions on data or functions are insufficient for guaranteed learnability, and introduces new Fourier-based tools for analyzing hardness in Euclidean spaces.
Findings
Certain 'nice' functions are hard to learn under some distributions.
Some simple functions are hard to learn even with 'nice' distributions.
New Fourier techniques extend hardness analysis to Euclidean spaces.
Abstract
Although neural networks are routinely and successfully trained in practice using simple gradient-based methods, most existing theoretical results are negative, showing that learning such networks is difficult, in a worst-case sense over all data distributions. In this paper, we take a more nuanced view, and consider whether specific assumptions on the "niceness" of the input distribution, or "niceness" of the target function (e.g. in terms of smoothness, non-degeneracy, incoherence, random choice of parameters etc.), are sufficient to guarantee learnability using gradient-based methods. We provide evidence that neither class of assumptions alone is sufficient: On the one hand, for any member of a class of "nice" target functions, there are difficult input distributions. On the other hand, we identify a family of simple target functions, which are difficult to learn even if the input…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and Algorithms · Domain Adaptation and Few-Shot Learning
