Quasi-Recurrent Neural Networks

James Bradbury; Stephen Merity; Caiming Xiong; Richard Socher

arXiv:1611.01576·cs.NE·November 22, 2016·326 cites

Quasi-Recurrent Neural Networks

James Bradbury, Stephen Merity, Caiming Xiong, Richard Socher

PDF

Open Access 5 Repos

TL;DR

QRNNs are a new neural network architecture that combines convolutional layers with a minimalist pooling function, offering faster training and testing while maintaining or improving predictive accuracy over traditional RNNs like LSTMs.

Contribution

This paper introduces QRNNs, a novel sequence modeling approach that enhances parallelism and speed without sacrificing accuracy, outperforming LSTMs in various tasks.

Findings

01

QRNNs are up to 16 times faster than LSTMs in training and testing.

02

Stacked QRNNs outperform stacked LSTMs of the same size in predictive accuracy.

03

QRNNs are effective for language modeling, sentiment analysis, and neural machine translation.

Abstract

Recurrent neural networks are a powerful tool for modeling sequential data, but the dependence of each timestep's computation on the previous timestep's output limits parallelism and makes RNNs unwieldy for very long sequences. We introduce quasi-recurrent neural networks (QRNNs), an approach to neural sequence modeling that alternates convolutional layers, which apply in parallel across timesteps, and a minimalist recurrent pooling function that applies in parallel across channels. Despite lacking trainable recurrent layers, stacked QRNNs have better predictive accuracy than stacked LSTMs of the same hidden size. Due to their increased parallelism, they are up to 16 times faster at train and test time. Experiments on language modeling, sentiment classification, and character-level neural machine translation demonstrate these advantages and underline the viability of QRNNs as a basic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMasked Convolution · Adam · Zoneout · Stochastic Gradient Descent · GloVe Embeddings · RMSProp · Weight Decay · Dropout · Tanh Activation · Sigmoid Activation