Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study
Yotam Alexander, Yonatan Slutzky, Yuval Ran-Milo, Nadav Cohen

TL;DR
This paper theoretically compares gradient descent and Guess & Check for neural network generalization, revealing that width and depth have opposite effects on generalization under Guess & Check, challenging conventional wisdom.
Contribution
It provides the first theoretical evidence that Guess & Check generalization deteriorates with width but improves with depth in neural networks.
Findings
Generalization under G&C worsens as width increases.
Generalization under G&C improves as depth increases.
Empirical validation supports theoretical results.
Abstract
Conventional wisdom attributes the mysterious generalization abilities of overparameterized neural networks to gradient descent (and its variants). The recent volume hypothesis challenges this view: it posits that these generalization abilities persist even when gradient descent is replaced by Guess & Check (G&C), i.e., by drawing weight settings until one that fits the training data is found. The validity of the volume hypothesis for wide and deep neural networks remains an open question. In this paper, we theoretically investigate this question for matrix factorization (with linear and non-linear activation)--a common testbed in neural network theory. We first prove that generalization under G&C deteriorates with increasing width, establishing what is, to our knowledge, the first case where G&C is provably inferior to gradient descent. Conversely, we prove that generalization under…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Neural Networks and Applications
