Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study

Yotam Alexander; Yonatan Slutzky; Yuval Ran-Milo; Nadav Cohen

arXiv:2506.03931·cs.LG·December 19, 2025

Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study

Yotam Alexander, Yonatan Slutzky, Yuval Ran-Milo, Nadav Cohen

PDF

Open Access 1 Video

TL;DR

This paper theoretically compares gradient descent and Guess & Check for neural network generalization, revealing that width and depth have opposite effects on generalization under Guess & Check, challenging conventional wisdom.

Contribution

It provides the first theoretical evidence that Guess & Check generalization deteriorates with width but improves with depth in neural networks.

Findings

01

Generalization under G&C worsens as width increases.

02

Generalization under G&C improves as depth increases.

03

Empirical validation supports theoretical results.

Abstract

Conventional wisdom attributes the mysterious generalization abilities of overparameterized neural networks to gradient descent (and its variants). The recent volume hypothesis challenges this view: it posits that these generalization abilities persist even when gradient descent is replaced by Guess & Check (G&C), i.e., by drawing weight settings until one that fits the training data is found. The validity of the volume hypothesis for wide and deep neural networks remains an open question. In this paper, we theoretically investigate this question for matrix factorization (with linear and non-linear activation)--a common testbed in neural network theory. We first prove that generalization under G&C deteriorates with increasing width, establishing what is, to our knowledge, the first case where G&C is provably inferior to gradient descent. Conversely, we prove that generalization under…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Do Neural Networks Need Gradient Descent to Generalize? A Theoretical Study· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Neural Networks and Applications