On the symmetries in the dynamics of wide two-layer neural networks

Karl Hajjar (LMO; CELESTE); Lenaic Chizat (EPFL)

arXiv:2211.08771·cs.LG·February 10, 2023

On the symmetries in the dynamics of wide two-layer neural networks

Karl Hajjar (LMO, CELESTE), Lenaic Chizat (EPFL)

PDF

1 Repo

TL;DR

This paper investigates how symmetries in the target function and input distribution influence the training dynamics of infinitely wide two-layer ReLU neural networks, revealing reductions to simpler models and convergence properties.

Contribution

It characterizes symmetry-preserving conditions in neural network training and demonstrates reductions to linear or lower-dimensional models under certain symmetry assumptions.

Findings

01

Predictor dynamics reduce to linear models for odd target functions.

02

Gradient flow PDE simplifies with low-dimensional target structures.

03

Numerical evidence shows input neurons align with low-dimensional structures.

Abstract

We consider the idealized setting of gradient flow on the population risk for infinitely wide two-layer ReLU neural networks (without bias), and study the effect of symmetries on the learned parameters and predictors. We first describe a general class of symmetries which, when satisfied by the target function $f^{*}$ and the input distribution, are preserved by the dynamics. We then study more specific cases. When $f^{*}$ is odd, we show that the dynamics of the predictor reduces to that of a (non-linearly parameterized) linear predictor, and its exponential convergence can be guaranteed. When $f^{*}$ has a low-dimensional structure, we prove that the gradient flow PDE reduces to a lower-dimensional PDE. Furthermore, we present informal and numerical arguments that suggest that the input neurons align with the lower-dimensional structure of the problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

karl-hajjar/learning-structure
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsALIGN