Convergence of Alternating Gradient Descent for Matrix Factorization
Rachel Ward, Tamara G. Kolda

TL;DR
This paper proves that alternating gradient descent converges efficiently to a low-rank matrix factorization for asymmetric matrices, with a simple proof technique and practical initialization that improves convergence.
Contribution
It provides a convergence guarantee for AGD on asymmetric matrix factorization with a novel, simple proof and effective random initialization.
Findings
AGD reaches $rac{1}{ ext{poly}(n)}$-accuracy in $O((rac{ ext{cond}( extbf{A})}{ ext{approximation}})^2 ext{polylog}(n))$ iterations.
A simple uniform PL inequality and Lipschitz smoothness are established for the iterates.
Experimental results show the proposed initialization significantly accelerates convergence.
Abstract
We consider alternating gradient descent (AGD) with fixed step size applied to the asymmetric matrix factorization objective. We show that, for a rank- matrix , iterations of alternating gradient descent suffice to reach an -optimal factorization with high probability starting from an atypical random initialization. The factors have rank so that and , and mild overparameterization suffices for the constant in the iteration complexity to be an absolute constant. Experiments suggest that our proposed initialization is not merely of theoretical benefit, but rather significantly…
Peer Reviews
Decision·NeurIPS 2023 spotlight
The (asymmetric) matrix factorization problem is both an important problem in itself, and an important test bed for developing techniques to analyze non-convex optimization. I think this paper takes a significant step towards understanding (alternating) gradient descent in this setting. It improves on the previous best known bound for GD which was ~ $\kappa^3.$ I find the improvement due to the asymmetric initialization quite illuminating, and I think it will inspire other advantageous (yet quic
I don't see any major weaknesses.
- Excellent novelty - Theoretical proof is elegant - Promising numerical results
It is worth noting that there are numerous competitors beyond first-order methods in the field to solve an exact rank-r matrix decomposition problem. To further enrich the discussion, it would be valuable if the authors explored potential avenues for generalizing their results. For example, investigating the application of their findings in matrix completion or the singular value decomposition (SVD) of a low-rank matrix with added noise could provide insights into the broader applicability of th
- The main technical contribution of this paper is the introduction and analysis of an asymmetric warm starting rule for matrix factorization. To the best of my knowledge this is an original technical contribution. - The theoretical analysis is interesting and advances the state of the art. Matrix factorization is a fundamental optimization task that among others, can be viewed as a subproblem of neural network training. As a result, progress in theoretical understanding of optimization algorith
- It seems to me like the assumption in line 454 that $V^\top \Phi_1$ has i.i.d entries is incorrect, and it only has i.i.d columns. In this case, a different lemma should be used in place of Proposition A.1. Does this change anything in the bounds? - I have some comments about the experiments in Section 6. It seems like the authors compare different algorithms using the same step size, however in my opinion it would be fairer to compare using the best fixed step size for each algorithm. This i
Videos
Taxonomy
TopicsFace and Expression Recognition · Matrix Theory and Algorithms
