Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit
Karl Hajjar (LMO, CELESTE), L\'ena\"ic Chizat (LMO), Christophe Giraud, (LMO)

TL;DR
This paper investigates the dynamics of deep neural networks with integrable parameterizations in the infinite-width limit, revealing conditions for non-trivial learning and proposing methods to avoid trivial stationary points.
Contribution
It introduces integrable parameterizations for deep networks, analyzes their behavior at infinite width, and proposes modifications to enable effective training.
Findings
Networks with more than four layers start at stationary points under standard initialization.
Large initial learning rates can modify dynamics to avoid trivial stationary points.
Numerical experiments show activation function choices significantly affect behavior.
Abstract
To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models make these dynamics difficult to analyze. To overcome these challenges, large-width asymptotics have recently emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood via these asymptotics that the nature of the trained model radically changes depending on the scale of the initial random weights, ranging from a kernel regime (for large initial variance) to a feature learning regime (for small initial variance). For deeper networks more regimes are possible, and in this paper we study in detail a specific choice of ''small'' initialization corresponding to "mean-field"…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Machine Learning in Materials Science
