# Reducing state updates via Gaussian-gated LSTMs

**Authors:** Matthew Thornton, Jithendar Anumula, Shih-Chii Liu

arXiv: 1901.07334 · 2019-01-23

## TL;DR

This paper introduces the Gaussian-gated LSTM (g-LSTM), a novel RNN architecture that improves long-term dependency learning, reduces computation, and accelerates convergence through timing gates and curriculum learning.

## Contribution

The paper proposes a timing-gated LSTM model with learnable Gaussian time gates that enhance long-term memory, reduce computation, and improve training efficiency over standard LSTMs.

## Key findings

- g-LSTM captures long-term dependencies better than LSTM
- The model reduces computation by at least 10x
- Curriculum learning accelerates convergence on long sequences

## Abstract

Recurrent neural networks can be difficult to train on long sequence data due to the well-known vanishing gradient problem. Some architectures incorporate methods to reduce RNN state updates, therefore allowing the network to preserve memory over long temporal intervals. To address these problems of convergence, this paper proposes a timing-gated LSTM RNN model, called the Gaussian-gated LSTM (g-LSTM). The time gate controls when a neuron can be updated during training, enabling longer memory persistence and better error-gradient flow. This model captures long-temporal dependencies better than an LSTM and the time gate parameters can be learned even from non-optimal initialization values. Because the time gate limits the updates of the neuron state, the number of computes needed for the network update is also reduced. By adding a computational budget term to the training loss, we can obtain a network which further reduces the number of computes by at least 10x. Finally, by employing a temporal curriculum learning schedule for the g-LSTM, we can reduce the convergence time of the equivalent LSTM network on long sequences.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/1901.07334/full.md

## Figures

46 figures with captions in the complete paper: https://tomesphere.com/paper/1901.07334/full.md

## References

22 references — full list in the complete paper: https://tomesphere.com/paper/1901.07334/full.md

---
Source: https://tomesphere.com/paper/1901.07334