Quasi-Recurrent Neural Networks
James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher

TL;DR
QRNNs are a new neural network architecture that combines convolutional layers with a minimalist pooling function, offering faster training and testing while maintaining or improving predictive accuracy over traditional RNNs like LSTMs.
Contribution
This paper introduces QRNNs, a novel sequence modeling approach that enhances parallelism and speed without sacrificing accuracy, outperforming LSTMs in various tasks.
Findings
QRNNs are up to 16 times faster than LSTMs in training and testing.
Stacked QRNNs outperform stacked LSTMs of the same size in predictive accuracy.
QRNNs are effective for language modeling, sentiment analysis, and neural machine translation.
Abstract
Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Despite lacking trainable recurrent layers, stacked QRNNs have better predictive accuracy than stacked LSTMs of the same hidden size. Due to their increased parallelism, they are up to 16 times faster at train and test time. Experiments on language modeling, sentiment classification, and character-level neural machine translation demonstrate these advantages and underline the viability of QRNNs as a basic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsMasked Convolution · Adam · Zoneout · Stochastic Gradient Descent · GloVe Embeddings · RMSProp · Weight Decay · Dropout · Tanh Activation · Sigmoid Activation
