Working Memory Connections for LSTM

Federico Landi; Lorenzo Baraldi; Marcella Cornia; Rita Cucchiara

arXiv:2109.00020·cs.LG·September 27, 2021

Working Memory Connections for LSTM

Federico Landi, Lorenzo Baraldi, Marcella Cornia, Rita Cucchiara

PDF

TL;DR

This paper introduces Working Memory Connections, a novel modification to LSTM gates that incorporates internal cell state information, leading to consistent performance improvements on various sequence modeling tasks.

Contribution

The paper proposes a new gating mechanism for LSTMs that directly integrates cell state information, addressing limitations of previous methods and enhancing long-term dependency learning.

Findings

01

Working Memory Connections improve LSTM performance across multiple tasks.

02

Inclusion of cell state information in gates is beneficial.

03

Previous methods failed due to key limitations that this work overcomes.

Abstract

Recurrent Neural Networks with Long Short-Term Memory (LSTM) make use of gating mechanisms to mitigate exploding and vanishing gradients when learning long-term dependencies. For this reason, LSTMs and other gated RNNs are widely adopted, being the standard de facto for many sequence modeling tasks. Although the memory cell inside the LSTM contains essential information, it is not allowed to influence the gating mechanism directly. In this work, we improve the gate potential by including information coming from the internal cell state. The proposed modification, named Working Memory Connection, consists in adding a learnable nonlinear projection of the cell content into the network gates. This modification can fit into the classical LSTM gates without any assumption on the underlying task, being particularly effective when dealing with longer sequences. Previous research effort in this…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory