Algorithmic Regularization in Model-free Overparametrized Asymmetric Matrix Factorization
Liwei Jiang, Yudong Chen, Lijun Ding

TL;DR
This paper demonstrates that vanilla gradient descent with early stopping can effectively recover principal components and produce optimal low-rank approximations in overparametrized asymmetric matrix factorization without explicit regularization.
Contribution
It provides a theoretical analysis showing how gradient descent implicitly regularizes in a model-free, overparametrized setting, with nearly dimension-free complexity bounds and minimal assumptions.
Findings
Gradient descent recovers principal components sequentially.
Early stopping yields optimal low-rank approximation.
Complexity depends logarithmically on approximation error.
Abstract
We study the asymmetric matrix factorization problem under a natural nonconvex formulation with arbitrary overparametrization. The model-free setting is considered, with minimal assumption on the rank or singular values of the observed matrix, where the global optima provably overfit. We show that vanilla gradient descent with small random initialization sequentially recovers the principal components of the observed matrix. Consequently, when equipped with proper early stopping, gradient descent produces the best low-rank approximation of the observed matrix without explicit regularization. We provide a sharp characterization of the relationship between the approximation error, iteration complexity, initialization size and stepsize. Our complexity bound is almost dimension-free and depends logarithmically on the approximation error, with significantly more lenient requirements on the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Matrix Theory and Algorithms · Stochastic Gradient Optimization Techniques
MethodsEarly Stopping
