# On the effect of the activation function on the distribution of hidden   nodes in a deep network

**Authors:** Philip M. Long, Hanie Sedghi

arXiv: 1901.02104 · 2019-01-09

## TL;DR

This paper investigates how the choice of activation function influences the distribution of hidden node lengths in deep networks with random Gaussian weights and biases, revealing conditions for predictable length behavior as network width grows.

## Contribution

It provides a theoretical analysis of the length distribution in deep networks, identifying conditions on activation functions that ensure convergence of the length process in large-width limits.

## Key findings

- Length process converges to a simple length map for activation functions satisfying minimal assumptions.
- Convergence may fail if the activation function violates these assumptions.
- Results apply to all commonly used activation functions in practice.

## Abstract

We analyze the joint probability distribution on the lengths of the vectors of hidden variables in different layers of a fully connected deep network, when the weights and biases are chosen randomly according to Gaussian distributions, and the input is in $\{ -1, 1\}^N$. We show that, if the activation function $\phi$ satisfies a minimal set of assumptions, satisfied by all activation functions that we know that are used in practice, then, as the width of the network gets large, the `length process' converges in probability to a length map that is determined as a simple function of the variances of the random weights and biases, and the activation function $\phi$. We also show that this convergence may fail for $\phi$ that violate our assumptions.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.02104/full.md

## Figures

14 figures with captions in the complete paper: https://tomesphere.com/paper/1901.02104/full.md

## References

17 references — full list in the complete paper: https://tomesphere.com/paper/1901.02104/full.md

---
Source: https://tomesphere.com/paper/1901.02104