On the non-universality of deep learning: quantifying the cost of   symmetry

Emmanuel Abbe; Enric Boix-Adsera

arXiv:2208.03113·cs.LG·October 17, 2022

On the non-universality of deep learning: quantifying the cost of symmetry

Emmanuel Abbe, Enric Boix-Adsera

PDF

Open Access 1 Video

TL;DR

This paper investigates the limitations of neural networks trained with noisy gradient descent, revealing how symmetry and equivariance constraints impact their learning capabilities and establishing hardness results under cryptographic assumptions.

Contribution

It characterizes the functions neural networks can learn under symmetry constraints and extends previous results on low-dimensional structure learning beyond the mean-field regime.

Findings

01

Depth-2 networks are as powerful as deeper ones for certain tasks

02

Limitations on learning functions on the hypercube and sphere

03

Hardness results for training networks under cryptographic assumptions

Abstract

We prove limitations on what neural networks trained by noisy gradient descent (GD) can efficiently learn. Our results apply whenever GD training is equivariant, which holds for many standard architectures and initializations. As applications, (i) we characterize the functions that fully-connected networks can weak-learn on the binary hypercube and unit sphere, demonstrating that depth-2 is as powerful as any other depth for this task; (ii) we extend the merged-staircase necessity result for learning with latent low-dimensional structure [ABM22] to beyond the mean-field regime. Under cryptographic assumptions, we also show hardness results for learning with fully-connected networks trained by stochastic gradient descent (SGD).

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

On the non-universality of deep learning: quantifying the cost of symmetry· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Adversarial Robustness in Machine Learning