Generalization in Deep Networks: The Role of Distance from Initialization
Vaishnavh Nagarajan, J. Zico Kolter

TL;DR
This paper investigates why deep neural networks generalize well despite their size, proposing that the effective capacity is constrained by the distance from the initial parameters, which is influenced by implicit regularization.
Contribution
It introduces an initialization-dependent notion of model capacity and provides empirical and theoretical evidence linking it to generalization in deep networks.
Findings
Model capacity is restricted by the distance from initialization.
Empirical evidence shows implicit regularization of the $\, ext{l}_2$ distance.
Theoretical arguments support initialization-dependent capacity notions.
Abstract
Why does training deep neural networks using stochastic gradient descent (SGD) result in a generalization error that does not worsen with the number of parameters in the network? To answer this question, we advocate a notion of effective model capacity that is dependent on {\em a given random initialization of the network} and not just the training algorithm and the data distribution. We provide empirical evidences that demonstrate that the model capacity of SGD-trained deep networks is in fact restricted through implicit regularization of {\em the distance from the initialization}. We also provide theoretical arguments that further highlight the need for initialization-dependent notions of model capacity. We leave as open questions how and why distance from initialization is regularized, and whether it is sufficient to explain generalization.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning
