Algorithmic Regularization in Model-free Overparametrized Asymmetric   Matrix Factorization

Liwei Jiang; Yudong Chen; Lijun Ding

arXiv:2203.02839·cs.LG·August 22, 2023·1 cites

Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization

Liwei Jiang, Yudong Chen, Lijun Ding

PDF

Open Access

TL;DR

This paper demonstrates that vanilla gradient descent with early stopping can effectively recover principal components and produce optimal low-rank approximations in overparametrized asymmetric matrix factorization without explicit regularization.

Contribution

It provides a theoretical analysis showing how gradient descent implicitly regularizes in a model-free, overparametrized setting, with nearly dimension-free complexity bounds and minimal assumptions.

Findings

01

Gradient descent recovers principal components sequentially.

02

Early stopping yields optimal low-rank approximation.

03

Complexity depends logarithmically on approximation error.

Abstract

We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization. The model-free setting is considered, with minimal assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization sequentially recovers the principal components of the observed matrix. Consequently, when equipped with proper early stopping, gradient descent produces the best low-rank approximation of the observed matrix without explicit regularization. We provide a sharp characterization of the relationship between the approximation error, iteration complexity, initialization size and stepsize. Our complexity bound is almost dimension-free and depends logarithmically on the approximation error, with significantly more lenient requirements on the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Matrix Theory and Algorithms · Stochastic Gradient Optimization Techniques

MethodsEarly Stopping