Training Integrable Parameterizations of Deep Neural Networks in the   Infinite-Width Limit

Karl Hajjar (LMO; CELESTE); L\'ena\"ic Chizat (LMO); Christophe Giraud; (LMO)

arXiv:2110.15596·cs.LG·December 21, 2021

Training Integrable Parameterizations of Deep Neural Networks in the Infinite-Width Limit

Karl Hajjar (LMO, CELESTE), L\'ena\"ic Chizat (LMO), Christophe Giraud, (LMO)

PDF

Open Access 1 Repo

TL;DR

This paper investigates the dynamics of deep neural networks with integrable parameterizations in the infinite-width limit, revealing conditions for non-trivial learning and proposing methods to avoid trivial stationary points.

Contribution

It introduces integrable parameterizations for deep networks, analyzes their behavior at infinite width, and proposes modifications to enable effective training.

Findings

01

Networks with more than four layers start at stationary points under standard initialization.

02

Large initial learning rates can modify dynamics to avoid trivial stationary points.

03

Numerical experiments show activation function choices significantly affect behavior.

Abstract

To theoretically understand the behavior of trained deep neural networks, it is necessary to study the dynamics induced by gradient methods from a random initialization. However, the nonlinear and compositional structure of these models make these dynamics difficult to analyze. To overcome these challenges, large-width asymptotics have recently emerged as a fruitful viewpoint and led to practical insights on real-world deep networks. For two-layer neural networks, it has been understood via these asymptotics that the nature of the trained model radically changes depending on the scale of the initial random weights, ranging from a kernel regime (for large initial variance) to a feature learning regime (for small initial variance). For deeper networks more regimes are possible, and in this paper we study in detail a specific choice of ''small'' initialization corresponding to "mean-field"…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

karl-hajjar/wide-networks
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsModel Reduction and Neural Networks · Stochastic Gradient Optimization Techniques · Machine Learning in Materials Science