TL;DR
This paper introduces DReLUs, a new activation function for QRNNs that improves training stability and performance, enabling deeper models and outperforming traditional tanh-based QRNNs and LSTMs in language tasks.
Contribution
The paper proposes DReLUs as a drop-in replacement for tanh in QRNNs, demonstrating improved performance and the ability to train deeper networks.
Findings
DReLUs reduce vanishing gradient issues in QRNNs.
DReLUs enable stacking of up to eight layers in character-level language modeling.
DReLU-based QRNNs outperform tanh-based QRNNs and LSTMs in experiments.
Abstract
In this paper, we introduce a novel type of Rectified Linear Unit (ReLU), called a Dual Rectified Linear Unit (DReLU). A DReLU, which comes with an unbounded positive and negative image, can be used as a drop-in replacement for a tanh activation function in the recurrent step of Quasi-Recurrent Neural Networks (QRNNs) (Bradbury et al. (2017)). Similar to ReLUs, DReLUs are less prone to the vanishing gradient problem, they are noise robust, and they induce sparse activations. We independently reproduce the QRNN experiments of Bradbury et al. (2017) and compare our DReLU-based QRNNs with the original tanh-based QRNNs and Long Short-Term Memory networks (LSTMs) on sentiment classification and word-level language modeling. Additionally, we evaluate on character-level language modeling, showing that we are able to stack up to eight QRNN layers with DReLUs, thus making it possible to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsConvolution · Sigmoid Activation · Masked Convolution · Quasi-Recurrent Neural Network · Tanh Activation
