Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization
Navid Azizan, Sahin Lale, Babak Hassibi

TL;DR
This paper investigates stochastic mirror descent in overparameterized nonlinear models, showing convergence to solutions close to the initial point and demonstrating how different implicit regularizations affect generalization in deep learning.
Contribution
It provides theoretical analysis of SMD convergence properties and experimental evidence of how various implicit regularizations influence generalization performance.
Findings
SMD converges to a global minimum close to the initial point in overparameterized models.
Different SMD variants exhibit distinct generalization performances.
L10-regularized SMD outperforms SGD and L1-regularized SMD in generalization.
Abstract
Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors that perfectly interpolate the training data). Therefore, it is important to understand which interpolating solutions we converge to, how they depend on the initialization point and the learning algorithm, and whether they lead to different generalization performances. In this paper, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which the popular stochastic gradient descent (SGD) is a special case. Our contributions are both theoretical and experimental. On the theory side, we show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima (something…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Face and Expression Recognition · Stochastic Gradient Optimization Techniques
MethodsStochastic Gradient Descent
