Large deviations of one-hidden-layer neural networks

Christian Hirsch; Daniel Willhalm

arXiv:2403.09310·math.PR·January 14, 2025·1 cites

Large deviations of one-hidden-layer neural networks

Christian Hirsch, Daniel Willhalm

PDF

Open Access

TL;DR

This paper develops large deviation principles for the training dynamics of one-hidden-layer neural networks, modeling weight updates as an interacting particle system with growing size and iterations, providing new theoretical insights.

Contribution

It introduces quenched and annealed large deviation principles for neural network training, accounting for discrete updates and increasing network size and training steps.

Findings

01

Derived quenched large deviation principle conditioned on initial weights.

02

Established annealed large deviation principle for empirical weight evolution.

03

Modeled weight dynamics as an interacting particle system with growing particles.

Abstract

We study large deviations in the context of stochastic gradient descent for one-hidden-layer neural networks with quadratic loss. We derive a quenched large deviation principle, where we condition on an initial weight measure, and an annealed large deviation principle for the empirical weight evolution during training when letting the number of neurons and the number of training iterations simultaneously tend to infinity. The weight evolution is treated as an interacting dynamic particle system. The distinctive aspect compared to prior work on interacting particle systems lies in the discrete particle updates, simultaneously with a growing number of particles.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications