Mean Field Theory of Activation Functions in Deep Neural Networks

Mirco Milletar\'i; Thiparat Chotibut; Paolo E. Trevisanutto

arXiv:1805.08786·cs.LG·June 7, 2019·1 cites

Mean Field Theory of Activation Functions in Deep Neural Networks

Mirco Milletar\'i, Thiparat Chotibut, Paolo E. Trevisanutto

PDF

Open Access 2 Repos

TL;DR

This paper develops a statistical mechanics model of deep neural networks to understand activation functions, revealing how different activations influence information propagation and network performance.

Contribution

It introduces a mean field theory linking energy-based and feedforward networks, deriving natural activation functions and analyzing their impact on network behavior.

Findings

01

ReLU emerges as a zero-noise limit of the model

02

Swish activation performs more consistently across architectures

03

Spectrum analysis shows activation functions affect Hessian properties

Abstract

We present a Statistical Mechanics (SM) model of deep neural networks, connecting the energy-based and the feed forward networks (FFN) approach. We infer that FFN can be understood as performing three basic steps: encoding, representation validation and propagation. From the meanfield solution of the model, we obtain a set of natural activations -- such as Sigmoid, $tanh$ and ReLu -- together with the state-of-the-art, Swish; this represents the expected information propagating through the network and tends to ReLu in the limit of zero noise.We study the spectrum of the Hessian on an associated classification task, showing that Swish allows for more consistent performances over a wider range of network architectures.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsQuantum many-body systems · Gaussian Processes and Bayesian Inference · Neural Networks and Applications

MethodsSigmoid Activation · (FiLe@Against@Claim)How do I file a claim against Expedia? · *Communicated@Fast*How Do I Communicate to Expedia?