Statistical physics and practical training of soft-committee machines

Martin Ahr; Michael Biehl; Robert Urbanczik

arXiv:cond-mat/9812197·cond-mat.dis-nn·October 31, 2009

Statistical physics and practical training of soft-committee machines

Martin Ahr, Michael Biehl, Robert Urbanczik

PDF

TL;DR

This paper uses statistical physics methods to analyze the equilibrium states of large neural networks, revealing phase transitions and matching theoretical predictions with practical training behaviors.

Contribution

It provides an analytical calculation of the quenched free energy for large neural networks and links equilibrium theory to practical training phenomena.

Findings

01

Identification of a first order phase transition at a critical training set size

02

Quantitative agreement between equilibrium theory and stochastic gradient descent simulations

03

Demonstration of plateau states in training corresponding to equilibrium configurations

Abstract

Equilibrium states of large layered neural networks with differentiable activation function and a single, linear output unit are investigated using the replica formalism. The quenched free energy of a student network with a very large number of hidden units learning a rule of perfectly matching complexity is calculated analytically. The system undergoes a first order phase transition from unspecialized to specialized student configurations at a critical size of the training set. Computer simulations of learning by stochastic gradient descent from a fixed training set demonstrate that the equilibrium results describe quantitatively the plateau states which occur in practical training procedures at sufficiently small but finite learning rates.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.