On the Generalization Mystery in Deep Learning
Satrajit Chatterjee, Piotr Zielinski

TL;DR
This paper proposes that the coherence of per-example gradients during training explains why over-parameterized neural networks generalize well, offering a metric to predict generalization and insights into training dynamics.
Contribution
It introduces a new, interpretable metric for gradient coherence that explains generalization, early learning, and noise robustness in deep neural networks.
Findings
Gradient coherence differs significantly between real and random datasets.
The metric predicts which solutions will generalize well.
Modifications to gradient descent can improve generalization.
Abstract
The generalization mystery in deep learning is the following: Why do over-parameterized neural networks trained with gradient descent (GD) generalize well on real datasets even though they are capable of fitting random datasets of comparable size? Furthermore, from among all solutions that fit the training data, how does GD find one that generalizes well (when such a well-generalizing solution exists)? We argue that the answer to both questions lies in the interaction of the gradients of different examples during training. Intuitively, if the per-example gradients are well-aligned, that is, if they are coherent, then one may expect GD to be (algorithmically) stable, and hence generalize well. We formalize this argument with an easy to compute and interpretable metric for coherence, and show that the metric takes on very different values on real and random datasets for several common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Stochastic Gradient Optimization Techniques · Machine Learning and Data Classification
MethodsEarly Stopping
