Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)

Brad Carlile; Guy Delamarter; Paul Kinney; Akiko Marti; Brian Whitney

arXiv:1710.09967·cs.LG·November 13, 2017·42 cites

Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs)

Brad Carlile, Guy Delamarter, Paul Kinney, Akiko Marti, Brian Whitney

PDF

Open Access

TL;DR

This paper introduces the ISRLU activation function, which accelerates learning and improves generalization in deep neural networks, outperforming ELU and ReLU, with efficient variants suitable for RNNs.

Contribution

The paper proposes the ISRLU activation function and its efficient variant ISRU, demonstrating improved training speed and performance in CNNs and RNNs over existing functions.

Findings

01

ISRLU outperforms ELU and ReLU in CNN training.

02

ISRLU enables faster learning and better generalization.

03

ISRU offers a computationally efficient alternative for RNNs.

Abstract

We introduce the "inverse square root linear unit" (ISRLU) to speed up learning in deep neural networks. ISRLU has better performance than ELU but has many of the same benefits. ISRLU and ELU have similar curves and characteristics. Both have negative values, allowing them to push mean unit activation closer to zero, and bring the normal gradient closer to the unit natural gradient, ensuring a noise-robust deactivation state, lessening the over fitting risk. The significant performance advantage of ISRLU on traditional CPUs also carry over to more efficient HW implementations on HW/SW codesign for CNNs/RNNs. In experiments with TensorFlow, ISRLU leads to faster learning and better generalization than ReLU on CNNs. This work also suggests a computationally efficient variant called the "inverse square root unit" (ISRU) which can be used for RNNs. Many RNNs use either long short-term…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Neural Network Applications · Machine Learning and ELM · Stochastic Gradient Optimization Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Gated Recurrent Unit · Tanh Activation · Sigmoid Activation · Long Short-Term Memory · *Communicated@Fast*How Do I Communicate to Expedia? · Exponential Linear Unit