Hardness of Learning Fixed Parities with Neural Networks
Itamar Shoshani, Ohad Shamir

TL;DR
This paper demonstrates that fixed parity functions are hard to learn with standard neural network training methods, explaining practical difficulties observed in learning such functions.
Contribution
The paper proves that training one-hidden-layer ReLU networks on fixed parity functions with gradient descent fails, supported by new Fourier analysis results.
Findings
Gradient descent fails to learn fixed parity functions.
Fourier coefficients of linear threshold functions decay rapidly.
Fixed parity functions are inherently hard for standard neural network training.
Abstract
Learning parity functions is a canonical problem in learning theory, which although computationally tractable, is not amenable to standard learning algorithms such as gradient-based methods. This hardness is usually explained via statistical query lower bounds [Kearns, 1998]. However, these bounds only imply that for any given algorithm, there is some worst-case parity function that will be hard to learn. Thus, they do not explain why fixed parities - say, the full parity function over all coordinates - are difficult to learn in practice, at least with standard predictors and gradient-based methods [Abbe and Boix-Adsera, 2022]. In this paper, we address this open problem, by showing that for any fixed parity of some minimal size, using it as a target function to train one-hidden-layer ReLU networks with perturbed gradient descent will fail to produce anything meaningful. To establish…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia?
