TL;DR
This paper investigates how symmetries in the target function and input distribution influence the training dynamics of infinitely wide two-layer ReLU neural networks, revealing reductions to simpler models and convergence properties.
Contribution
It characterizes symmetry-preserving conditions in neural network training and demonstrates reductions to linear or lower-dimensional models under certain symmetry assumptions.
Findings
Predictor dynamics reduce to linear models for odd target functions.
Gradient flow PDE simplifies with low-dimensional target structures.
Numerical evidence shows input neurons align with low-dimensional structures.
Abstract
We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function and the input distribution, are preserved by the dynamics. We then study more specific cases. When is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
