Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation   Functions in Quasi-Recurrent Neural Networks

Fr\'ederic Godin; Jonas Degrave; Joni Dambre; Wesley De Neve

arXiv:1707.08214·cs.CL·March 1, 2019

Dual Rectified Linear Units (DReLUs): A Replacement for Tanh Activation Functions in Quasi-Recurrent Neural Networks

Fr\'ederic Godin, Jonas Degrave, Joni Dambre, Wesley De Neve

PDF

2 Repos

TL;DR

This paper introduces DReLUs, a new activation function for QRNNs that improves training stability and performance, enabling deeper models and outperforming traditional tanh-based QRNNs and LSTMs in language tasks.

Contribution

The paper proposes DReLUs as a drop-in replacement for tanh in QRNNs, demonstrating improved performance and the ability to train deeper networks.

Findings

01

DReLUs reduce vanishing gradient issues in QRNNs.

02

DReLUs enable stacking of up to eight layers in character-level language modeling.

03

DReLU-based QRNNs outperform tanh-based QRNNs and LSTMs in experiments.

Abstract

In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. (2017) and compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsConvolution · Sigmoid Activation · Masked Convolution · Quasi-Recurrent Neural Network · Tanh Activation