Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training
Anup Sarma, Sonali Singh, Huaipan Jiang, Rui Zhang, Mahmut T Kandemir, and Chita R Das

TL;DR
This paper introduces a structured dropout method for LSTM RNNs that induces sparsity in hidden states, enabling significant training speedups without loss of accuracy across multiple NLP tasks.
Contribution
The authors propose a novel dropout pattern that creates structured sparsity in LSTM hidden states, facilitating efficient computation reduction during training.
Findings
Achieved 1.23x to 1.64x faster training times
Maintained comparable performance metrics
Validated across language modeling, translation, and NER tasks
Abstract
Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applications is computationally intensive due to the recurrent nature of hidden state computation that repeats for each time step. While sparsity in Deep Neural Nets has been widely seen as an opportunity for reducing computation time in both training and inference phases, the usage of non-ReLU activation in LSTM RNNs renders the opportunities for such dynamic sparsity associated with neuron activation and gradient values to be limited or non-existent. In this work, we identify dropout induced sparsity for LSTMs as a suitable mode of computation reduction. Dropout is a widely used regularization mechanism, which randomly drops computed neuron values during each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Dropout
