On the non-universality of deep learning: quantifying the cost of symmetry
Emmanuel Abbe, Enric Boix-Adsera

TL;DR
This paper investigates the limitations of neural networks trained with noisy gradient descent, revealing how symmetry and equivariance constraints impact their learning capabilities and establishing hardness results under cryptographic assumptions.
Contribution
It characterizes the functions neural networks can learn under symmetry constraints and extends previous results on low-dimensional structure learning beyond the mean-field regime.
Findings
Depth-2 networks are as powerful as deeper ones for certain tasks
Limitations on learning functions on the hypercube and sphere
Hardness results for training networks under cryptographic assumptions
Abstract
We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the mean-field regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning
