Long Short-Term Memory as a Dynamically Computed Element-wise Weighted   Sum

Omer Levy; Kenton Lee; Nicholas FitzGerald; Luke Zettlemoyer

arXiv:1805.03716·cs.CL·May 11, 2018

Long Short-Term Memory as a Dynamically Computed Element-wise Weighted Sum

Omer Levy, Kenton Lee, Nicholas FitzGerald, Luke Zettlemoyer

PDF

TL;DR

This paper offers a new perspective on LSTMs, showing that their gating mechanisms alone can perform as well as traditional LSTMs, indicating gates have greater representational power than previously thought.

Contribution

It introduces a new class of RNNs decoupling gates from the embedded RNN, highlighting the gates' role as versatile, powerful recurrent models beyond vanishing gradient mitigation.

Findings

01

Gating mechanisms alone match LSTM performance in many tasks.

02

Decoupled gates compute element-wise weighted sums of input functions.

03

Gates provide significant representational power beyond vanishing gradient control.

Abstract

LSTMs were introduced to combat vanishing gradients in simple RNNs by augmenting them with gated additive recurrent connections. We present an alternative view to explain the success of LSTMs: the gates themselves are versatile recurrent models that provide more representational power than previously appreciated. We do this by decoupling the LSTM's gates from the embedded simple RNN, producing a new class of RNNs where the recurrence computes an element-wise weighted sum of context-independent functions of the input. Ablations on a range of problems demonstrate that the gating mechanism alone performs as well as an LSTM in most settings, strongly suggesting that the gates are doing much more in practice than just alleviating vanishing gradients.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory