Over-Parametrized Matrix Factorization in the Presence of Spurious   Stationary Points

Armin Eftekhari

arXiv:2112.13269·cs.LG·February 9, 2022

Over-Parametrized Matrix Factorization in the Presence of Spurious Stationary Points

Armin Eftekhari

PDF

TL;DR

This paper investigates the optimization landscape of over-parametrized matrix factorization, showing that gradient flow can converge to a global minimum despite the presence of spurious stationary points, especially when initialized rank-deficient.

Contribution

It demonstrates that gradient flow avoids spurious stationary points through rank-deficient initialization, contrasting with local refinement methods and without relying on the restricted isometry property.

Findings

01

Gradient flow converges to global minima despite SSPs.

02

Rank-deficient initialization is crucial for convergence.

03

Heuristic discretization inspired by primal-dual algorithms is effective.

Abstract

Motivated by the emerging role of interpolating machines in signal processing and machine learning, this work considers the computational aspects of over-parametrized matrix factorization. In this context, the optimization landscape may contain spurious stationary points (SSPs), which are proved to be full-rank matrices. The presence of these SSPs means that it is impossible to hope for any global guarantees in over-parametrized matrix factorization. For example, when initialized at an SSP, the gradient flow will be trapped there forever. Nevertheless, despite these SSPs, we establish in this work that the gradient flow of the corresponding merit function converges to a global minimizer, provided that its initialization is rank-deficient and sufficiently close to the feasible set of the optimization problem. We numerically observe that a heuristic discretization of the proposed gradient…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.