Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Tianyi Liu; Yan Li; Enlu Zhou; Tuo Zhao

arXiv:2202.03535·cs.LG·February 9, 2022

Noise Regularizes Over-parameterized Rank One Matrix Recovery, Provably

Tianyi Liu, Yan Li, Enlu Zhou, Tuo Zhao

PDF

Open Access

TL;DR

This paper demonstrates that noise in gradient descent algorithms acts as a regularizer, significantly improving the accuracy of over-parameterized rank one matrix recovery by reducing the mean square error proportionally to the noise variance.

Contribution

It provides a theoretical analysis showing how noise regularizes over-parameterized models, leading to better recovery accuracy compared to noise-free methods.

Findings

01

Random perturbation reduces mean square error to O(σ^2/d)

02

Gradient descent without noise attains mean square error of O(σ^2)

03

Noise acts as an implicit regularizer in over-parameterized models

Abstract

We investigate the role of noise in optimization algorithms for learning over-parameterized models. Specifically, we consider the recovery of a rank one matrix $Y^{*} \in R^{d \times d}$ from a noisy observation $Y$ using an over-parameterization model. We parameterize the rank one matrix $Y^{*}$ by $X X^{⊤}$ , where $X \in R^{d \times d}$ . We then show that under mild conditions, the estimator, obtained by the randomly perturbed gradient descent algorithm using the square loss function, attains a mean square error of $O (σ^{2} / d)$ , where $σ^{2}$ is the variance of the observational noise. In contrast, the estimator obtained by gradient descent without random perturbation only attains a mean square error of $O (σ^{2})$ . Our result partially justifies the implicit regularization effect of noise when learning over-parameterized models, and provides new understanding of training…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Neural Networks and Applications