Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization
Hancheng Min, Enrique Mallada, Ren\'e Vidal

TL;DR
This paper analyzes how neurons in a two-layer ReLU network align with data during early training with small initialization, providing bounds on the alignment time and demonstrating convergence and low-rank structure post-alignment.
Contribution
It offers a theoretical analysis of neuron alignment dynamics in small-initialization training, including convergence bounds and low-rank structure emergence.
Findings
Neurons align with data within O(log n / sqrt(μ)) time.
Loss converges at a rate of O(1/t) after early alignment.
First layer weights become approximately low-rank.
Abstract
This paper studies the problem of training a two-layer ReLU network for binary classification using gradient flow with small initialization. We consider a training dataset with well-separated input vectors: Any pair of input data with the same label are positively correlated, and any pair with different labels are negatively correlated. Our analysis shows that, during the early phase of training, neurons in the first layer try to align with either the positive data or the negative data, depending on its corresponding weight on the second layer. A careful analysis of the neurons' directional dynamics allows us to provide an upper bound on the time it takes for all neurons to achieve good alignment with the input data, where is the number of data points and measures how well the data are separated. After the early alignment phase, the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Neural dynamics and brain function · stochastic dynamics and bifurcation
MethodsALIGN
