Stochastic Mirror Descent on Overparameterized Nonlinear Models:   Convergence, Implicit Regularization, and Generalization

Navid Azizan; Sahin Lale; Babak Hassibi

arXiv:1906.03830·cs.LG·June 11, 2019·29 cites

Stochastic Mirror Descent on Overparameterized Nonlinear Models: Convergence, Implicit Regularization, and Generalization

Navid Azizan, Sahin Lale, Babak Hassibi

PDF

Open Access 1 Repo

TL;DR

This paper investigates stochastic mirror descent in overparameterized nonlinear models, showing convergence to solutions close to the initial point and demonstrating how different implicit regularizations affect generalization in deep learning.

Contribution

It provides theoretical analysis of SMD convergence properties and experimental evidence of how various implicit regularizations influence generalization performance.

Findings

01

SMD converges to a global minimum close to the initial point in overparameterized models.

02

Different SMD variants exhibit distinct generalization performances.

03

L10-regularized SMD outperforms SGD and L1-regularized SMD in generalization.

Abstract

Most modern learning problems are highly overparameterized, meaning that there are many more parameters than the number of training data points, and as a result, the training loss may have infinitely many global minima (parameter vectors that perfectly interpolate the training data). Therefore, it is important to understand which interpolating solutions we converge to, how they depend on the initialization point and the learning algorithm, and whether they lead to different generalization performances. In this paper, we study these questions for the family of stochastic mirror descent (SMD) algorithms, of which the popular stochastic gradient descent (SGD) is a special case. Our contributions are both theoretical and experimental. On the theory side, we show that in the overparameterized nonlinear setting, if the initialization is close enough to the manifold of global minima (something…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

SahinLale/StochasticMirrorDescent
pytorch

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Face and Expression Recognition · Stochastic Gradient Optimization Techniques

MethodsStochastic Gradient Descent