Emergent Low-Rank Training Dynamics in MLPs with Smooth Activations
Alec S. Xu, Can Yaras, Matthew Asato, Qing Qu, Laura Balzano

TL;DR
This paper investigates the low-dimensional subspace dynamics of MLP training with smooth activations, providing theoretical insights and demonstrating that low-rank parameterizations can match full models' performance.
Contribution
It offers a theoretical characterization of invariant low-dimensional subspaces in nonlinear MLPs and shows low-rank models can achieve comparable accuracy.
Findings
Training dynamics concentrate in low-dimensional subspaces
Invariant subspaces are precisely characterized for two-layer networks
Low-rank parameterizations can match full model performance
Abstract
Recent empirical evidence has demonstrated that the training dynamics of large-scale deep neural networks occur within low-dimensional subspaces. While this has inspired new research into low-rank training, compression, and adaptation, theoretical justification for these dynamics in nonlinear networks remains limited. %compared to deep linear settings. To address this gap, this paper analyzes the learning dynamics of multi-layer perceptrons (MLPs) under gradient descent (GD). We demonstrate that the weight dynamics concentrate within invariant low-dimensional subspaces throughout training. Theoretically, we precisely characterize these invariant subspaces for two-layer networks with smooth nonlinear activations, providing insight into their emergence. Experimentally, we validate that this phenomenon extends beyond our theoretical assumptions. Leveraging these insights, we empirically…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Neural Networks and Reservoir Computing · Neural Networks and Applications
