Taming the ReLU with Parallel Dither in a Deep Neural Network
Andrew J.R. Simpson

TL;DR
This paper explains why ReLU activations are effective in deep neural networks by identifying their role as demodulators and introduces Parallel Dither as a method to suppress decoy features, improving learning reliability.
Contribution
The paper reveals ReLU's role as a demodulator and proposes Parallel Dither to reduce nonlinear distortion, enhancing deep neural network training.
Findings
ReLU functions as an ideal demodulator in DNNs.
Parallel Dither suppresses decoy features caused by ReLU nonlinearities.
Using Parallel Dither improves learning speed and reduces overfitting.
Abstract
Rectified Linear Units (ReLU) seem to have displaced traditional 'smooth' nonlinearities as activation-function-du-jour in many - but not all - deep neural network (DNN) applications. However, nobody seems to know why. In this article, we argue that ReLU are useful because they are ideal demodulators - this helps them perform fast abstract learning. However, this fast learning comes at the expense of serious nonlinear distortion products - decoy features. We show that Parallel Dither acts to suppress the decoy features, preventing overfitting and leaving the true features cleanly demodulated for rapid, reliable learning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Image and Signal Denoising Methods · Neural Networks and Reservoir Computing
Methods*Communicated@Fast*How Do I Communicate to Expedia?
