The Generalization Error of Stochastic Mirror Descent on Over-Parametrized Linear Models
Danil Akhtiamov, Babak Hassibi

TL;DR
This paper analyzes the generalization error of stochastic mirror descent (SMD) in over-parametrized linear models, revealing how different potential functions influence generalization in binary classification tasks.
Contribution
It derives the generalization error of SMD in over-parametrized linear models and compares the effects of different regularizers on performance.
Findings
SMD with _2 regularizer (SGD) outperforms _1 in one data model.
SMD with _1 regularizer outperforms _2 in another data model.
Simulation results validate the theoretical analysis.
Abstract
Despite being highly over-parametrized, and having the ability to fully interpolate the training data, deep networks are known to generalize well to unseen data. It is now understood that part of the reason for this is that the training algorithms used have certain implicit regularization properties that ensure interpolating solutions with "good" properties are found. This is best understood in linear over-parametrized models where it has been shown that the celebrated stochastic gradient descent (SGD) algorithm finds an interpolating solution that is closest in Euclidean distance to the initial weight vector. Different regularizers, replacing Euclidean distance with Bregman divergence, can be obtained if we replace SGD with stochastic mirror descent (SMD). Empirical observations have shown that in the deep network setting, SMD achieves a generalization performance that is different…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGaussian Processes and Bayesian Inference · Domain Adaptation and Few-Shot Learning · Bayesian Modeling and Causal Inference
MethodsStochastic Gradient Descent
