Implicit Regularization Properties of Variance Reduced Stochastic Mirror   Descent

Yiling Luo; Xiaoming Huo; Yajun Mei

arXiv:2205.00058·stat.ML·August 30, 2022

Implicit Regularization Properties of Variance Reduced Stochastic Mirror Descent

Yiling Luo, Xiaoming Huo, Yajun Mei

PDF

TL;DR

This paper investigates the implicit regularization properties of the variance reduced stochastic mirror descent (VRSMD) algorithm, proving its convergence to the minimum mirror interpolant in linear regression and demonstrating its effectiveness in sparse model estimation.

Contribution

It establishes the implicit regularization property of VRSMD and provides theoretical and empirical insights into its performance in linear regression and sparse models.

Findings

01

VRSMD converges to the minimum mirror interpolant in linear regression.

02

VRSMD exhibits implicit regularization similar to gradient descent.

03

Numerical examples show VRSMD's empirical effectiveness.

Abstract

In machine learning and statistical data analysis, we often run into objective function that is a summation: the number of terms in the summation possibly is equal to the sample size, which can be enormous. In such a setting, the stochastic mirror descent (SMD) algorithm is a numerically efficient method -- each iteration involving a very small subset of the data. The variance reduction version of SMD (VRSMD) can further improve SMD by inducing faster convergence. On the other hand, algorithms such as gradient descent and stochastic gradient descent have the implicit regularization property that leads to better performance in terms of the generalization errors. Little is known on whether such a property holds for VRSMD. We prove here that the discrete VRSMD estimator sequence converges to the minimum mirror interpolant in the linear regression. This establishes the implicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.