Taming the ReLU with Parallel Dither in a Deep Neural Network

Andrew J.R. Simpson

arXiv:1509.05173·cs.LG·September 18, 2015·1 cites

Taming the ReLU with Parallel Dither in a Deep Neural Network

Andrew J.R. Simpson

PDF

Open Access

TL;DR

This paper explains why ReLU activations are effective in deep neural networks by identifying their role as demodulators and introduces Parallel Dither as a method to suppress decoy features, improving learning reliability.

Contribution

The paper reveals ReLU's role as a demodulator and proposes Parallel Dither to reduce nonlinear distortion, enhancing deep neural network training.

Findings

01

ReLU functions as an ideal demodulator in DNNs.

02

Parallel Dither suppresses decoy features caused by ReLU nonlinearities.

03

Using Parallel Dither improves learning speed and reduces overfitting.

Abstract

Rectified Linear Units (ReLU) seem to have displaced traditional 'smooth' nonlinearities as activation-function-du-jour in many - but not all - deep neural network (DNN) applications. However, nobody seems to know why. In this article, we argue that ReLU are useful because they are ideal demodulators - this helps them perform fast abstract learning. However, this fast learning comes at the expense of serious nonlinear distortion products - decoy features. We show that Parallel Dither acts to suppress the decoy features, preventing overfitting and leaving the true features cleanly demodulated for rapid, reliable learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Neural Networks and Reservoir Computing

Methods*Communicated@Fast*How Do I Communicate to Expedia?