LSTM with Working Memory

Andrew Pulver; Siwei Lyu

arXiv:1605.01988·cs.NE·April 3, 2017

LSTM with Working Memory

Andrew Pulver, Siwei Lyu

PDF

TL;DR

This paper introduces a simplified LSTM-like RNN architecture that outperforms traditional LSTM on various tasks and presents a new benchmark for evaluating RNN performance.

Contribution

A novel, simpler RNN architecture similar to LSTM that achieves better results and a new benchmark for assessing RNN capabilities.

Findings

01

Proposed architecture outperforms standard LSTM on tested tasks.

02

New benchmark effectively stresses key RNN capabilities.

03

Architecture maintains simplicity and efficiency.

Abstract

Previous RNN architectures have largely been superseded by LSTM, or "Long Short-Term Memory". Since its introduction, there have been many variations on this simple design. However, it is still widely used and we are not aware of a gated-RNN architecture that outperforms LSTM in a broad sense while still being as simple and efficient. In this paper we propose a modified LSTM-like architecture. Our architecture is still simple and achieves better performance on the tasks that we tested on. We also introduce a new RNN performance benchmark that uses the handwritten digits and stresses several important network capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory