Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular   Matrix Factorization and Linear Neural Networks

Zhenghao Xu; Yuqing Wang; Tuo Zhao; Rachel Ward; Molei Tao

arXiv:2410.09640·cs.LG·December 3, 2024

Provable Acceleration of Nesterov's Accelerated Gradient for Rectangular Matrix Factorization and Linear Neural Networks

Zhenghao Xu, Yuqing Wang, Tuo Zhao, Rachel Ward, Molei Tao

PDF

Open Access

TL;DR

This paper proves that Nesterov's accelerated gradient method achieves the fastest known convergence rate for rectangular matrix factorization and linear neural networks, improving upon previous bounds with a novel unbalanced initialization.

Contribution

The paper establishes provable acceleration of Nesterov's method for nonconvex matrix factorization and neural networks, using a new unbalanced initialization strategy.

Findings

01

NAG attains an iteration complexity of O(κ log(1/ε)) for matrix factorization.

02

Unbalanced initialization enables faster convergence without large network widths.

03

Results extend to linear neural networks with minimal width requirements.

Abstract

We study the convergence rate of first-order methods for rectangular matrix factorization, which is a canonical nonconvex optimization problem. Specifically, given a rank- $r$ matrix $A \in R^{m \times n}$ , we prove that gradient descent (GD) can find a pair of $ϵ$ -optimal solutions $X_{T} \in R^{m \times d}$ and $Y_{T} \in R^{n \times d}$ , where $d \geq r$ , satisfying $∥ X_{T} Y_{T}^{⊤} - A ∥_{F} \leq ϵ ∥ A ∥_{F}$ in $T = O (κ^{2} lo g \frac{1}{ϵ})$ iterations with high probability, where $κ$ denotes the condition number of $A$ . Furthermore, we prove that Nesterov's accelerated gradient (NAG) attains an iteration complexity of $O (κ lo g \frac{1}{ϵ})$ , which is the best-known bound of first-order methods for rectangular matrix factorization.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications