Understanding Incremental Learning of Gradient Descent: A Fine-grained Analysis of Matrix Sensing
Jikai Jin, Zhiyuan Li, Kaifeng Lyu, Simon S. Du, Jason D., Lee

TL;DR
This paper offers a detailed analysis of how gradient descent incrementally learns low-rank matrices in the matrix sensing problem, revealing its similarity to greedy heuristics and extending understanding beyond over-parameterized models.
Contribution
It provides a comprehensive characterization of the entire GD learning process for matrix sensing, including under-parameterized regimes, which was not previously analyzed.
Findings
GD with small initialization mimics greedy low-rank learning
GD learns solutions with increasing rank sequentially
The analysis applies to both over- and under-parameterized regimes
Abstract
It is believed that Gradient Descent (GD) induces an implicit bias towards good generalization in training machine learning models. This paper provides a fine-grained analysis of the dynamics of GD for the matrix sensing problem, whose goal is to recover a low-rank ground-truth matrix from near-isotropic linear measurements. It is shown that GD with small initialization behaves similarly to the greedy low-rank learning heuristics (Li et al., 2020) and follows an incremental learning procedure (Gissin et al., 2019): GD sequentially learns solutions with increasing ranks until it recovers the ground truth matrix. Compared to existing works which only analyze the first learning phase for rank-1 solutions, our result provides characterizations for the whole learning process. Moreover, besides the over-parameterized regime that many prior works focused on, our analysis of the incremental…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and ELM · Stochastic Gradient Optimization Techniques
