On the hardness of learning under symmetries
Bobak T. Kiani, Thien Le, Hannah Lawrence, Stefanie Jegelka, Melanie, Weber

TL;DR
This paper demonstrates that incorporating symmetries into neural networks does not fundamentally reduce the computational hardness of learning these models with gradient descent, as lower bounds show exponential or superpolynomial complexity.
Contribution
It provides the first theoretical lower bounds showing that symmetry-based inductive biases do not eliminate the inherent hardness of learning neural networks with gradient descent.
Findings
Lower bounds for shallow GNNs and CNNs under permutation groups.
Exponential or superpolynomial complexity in input dimension.
Symmetry does not simplify the fundamental learning difficulty.
Abstract
We study the problem of learning equivariant neural networks via gradient descent. The incorporation of known symmetries ("equivariance") into neural nets has empirically improved the performance of learning pipelines, in domains ranging from biology to computer vision. However, a rich yet separate line of learning theoretic research has demonstrated that actually learning shallow, fully-connected (i.e. non-symmetric) networks has exponential complexity in the correlational statistical query (CSQ) model, a framework encompassing gradient descent. In this work, we ask: are known problem symmetries sufficient to alleviate the fundamental hardness of learning neural nets with gradient descent? We answer this question in the negative. In particular, we give lower bounds for shallow graph neural networks, convolutional networks, invariant polynomials, and frame-averaged networks for…
Peer Reviews
Decision·ICLR 2024 spotlight
1. The question studied in this paper is closely related to a core question in understanding deep learning, that is: can deep learning benefit from symmetry-inspired algorithmic designs? In this sense I deem the question studied in the paper valuable and this paper's attempt to deal with it respectful. 2. The technical contribution of this paper, although still depended on some prior works, is novel enough to my understanding to be nontrivial. This paper constructed function classes that were no
The weaknesses listed below are, in my opinion, secondary to the contributions of this paper. The approach of this paper in studying the hardness of learning symmetry-enhanced neural networks has certain limitations. It cannot account for all neural architectures at once and requires specific construction whenever the problem formulation changes by a little bit. And the hard function classes, although are well designed for the proof, are not very intuitive in terms of broader impact to people wh
1. The problem considered in this work is interesting and well-motivated. Most theoretical prior works on learning neural networks focused on fully connected shallow networks; investigating the learnability of popular and practically relevant classes of neural networks such as GNNs and CNNs (that have more restricted symmetric structure) is a natural next step. 2. The paper provides hardness results for various classes of ``symmetric'' neural networks in the SQ and CSQ models that are general m
1. The novelty of the technics and arguments used in the lower bounds provided in this work may be limited in the sense that most of the claimed results rely heavily on machinery developed in the prior works [1,2].
Originality: The authors prove numerous new results on the sample complexity of learning in neural networks, and provide ample empirical support for their work. Quality/clarity: The authors sketch their proofs using careful, clear technical arguments. Additionally, their experiments are simple, but clear demonstrations of the practical difficulty of learning networks within the families they authors study. Significance: The author's work significantly advances progress on the hardness of l
I would've liked a _slightly_ more thorough empirical treatment, if only to make sure that the failure to learn was not due to poor hyperparameter choices / poor initialization etc.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Neural Networks and Applications · Stochastic Gradient Optimization Techniques
