Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for   Efficient Training

Anup Sarma; Sonali Singh; Huaipan Jiang; Rui Zhang; Mahmut T Kandemir; and Chita R Das

arXiv:2106.12089·cs.LG·June 24, 2021

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training

Anup Sarma, Sonali Singh, Huaipan Jiang, Rui Zhang, Mahmut T Kandemir, and Chita R Das

PDF

Open Access 1 Video

TL;DR

This paper introduces a structured dropout method for LSTM RNNs that induces sparsity in hidden states, enabling significant training speedups without loss of accuracy across multiple NLP tasks.

Contribution

The authors propose a novel dropout pattern that creates structured sparsity in LSTM hidden states, facilitating efficient computation reduction during training.

Findings

01

Achieved 1.23x to 1.64x faster training times

02

Maintained comparable performance metrics

03

Validated across language modeling, translation, and NER tasks

Abstract

Recurrent Neural Networks (RNNs), more specifically their Long Short-Term Memory (LSTM) variants, have been widely used as a deep learning tool for tackling sequence-based learning tasks in text and speech. Training of such LSTM applications is computationally intensive due to the recurrent nature of hidden state computation that repeats for each time step. While sparsity in Deep Neural Nets has been widely seen as an opportunity for reducing computation time in both training and inference phases, the usage of non-ReLU activation in LSTM RNNs renders the opportunities for such dynamic sparsity associated with neuron activation and gradient values to be limited or non-existent. In this work, we identify dropout induced sparsity for LSTMs as a suitable mode of computation reduction. Dropout is a widely used regularization mechanism, which randomly drops computed neuron values during each…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Structured in Space, Randomized in Time: Leveraging Dropout in RNNs for Efficient Training· slideslive

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Advanced Neural Network Applications

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Dropout