Activation function design for deep networks: linearity and effective   initialisation

Michael Murray; Vinayak Abrol; Jared Tanner

arXiv:2105.07741·cs.LG·May 18, 2021

Activation function design for deep networks: linearity and effective initialisation

Michael Murray, Vinayak Abrol, Jared Tanner

PDF

1 Repo

TL;DR

This paper investigates how the shape of activation functions affects initialisation and training in deep networks, proposing that larger linear regions around zero improve convergence and stability, leading to better performance.

Contribution

It introduces the importance of a sufficiently large linear region in activation functions for avoiding initialisation problems and demonstrates practical benefits through empirical validation.

Findings

01

Larger linear regions in activation functions improve initialisation stability.

02

Networks with these activation functions achieve higher accuracy and faster training.

03

Shape outside the linear region has limited impact on training outcomes.

Abstract

The activation function deployed in a deep neural network has great influence on the performance of the network at initialisation, which in turn has implications for training. In this paper we study how to avoid two problems at initialisation identified in prior works: rapid convergence of pairwise input correlations, and vanishing and exploding gradients. We prove that both these problems can be avoided by choosing an activation function possessing a sufficiently large linear region around the origin, relative to the bias variance $σ_{b}^{2}$ of the network's random initialisation. We demonstrate empirically that using such activation functions leads to tangible benefits in practice, both in terms test and training accuracy as well as training time. Furthermore, we observe that the shape of the nonlinear activation outside the linear region appears to have a relatively limited impact…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Cross-Caps/AFLI
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.