Early Neuron Alignment in Two-layer ReLU Networks with Small   Initialization

Hancheng Min; Enrique Mallada; Ren\'e Vidal

arXiv:2307.12851·cs.LG·March 26, 2024

Early Neuron Alignment in Two-layer ReLU Networks with Small Initialization

Hancheng Min, Enrique Mallada, Ren\'e Vidal

PDF

Open Access

TL;DR

This paper analyzes how neurons in a two-layer ReLU network align with data during early training with small initialization, providing bounds on the alignment time and demonstrating convergence and low-rank structure post-alignment.

Contribution

It offers a theoretical analysis of neuron alignment dynamics in small-initialization training, including convergence bounds and low-rank structure emergence.

Findings

01

Neurons align with data within O(log n / sqrt(μ)) time.

02

Loss converges at a rate of O(1/t) after early alignment.

03

First layer weights become approximately low-rank.

Abstract

This paper studies the problem of training a two-layer ReLU network for binary classification using gradient flow with small initialization. We consider a training dataset with well-separated input vectors: Any pair of input data with the same label are positively correlated, and any pair with different labels are negatively correlated. Our analysis shows that, during the early phase of training, neurons in the first layer try to align with either the positive data or the negative data, depending on its corresponding weight on the second layer. A careful analysis of the neurons' directional dynamics allows us to provide an $O (\frac{l o g n}{μ})$ upper bound on the time it takes for all neurons to achieve good alignment with the input data, where $n$ is the number of data points and $μ$ measures how well the data are separated. After the early alignment phase, the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Neural dynamics and brain function · stochastic dynamics and bifurcation

MethodsALIGN