Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term   Memory (LSTM) Network

Alex Sherstinsky

arXiv:1808.03314·cs.LG·August 1, 2023

Fundamentals of Recurrent Neural Network (RNN) and Long Short-Term Memory (LSTM) Network

Alex Sherstinsky

PDF

3 Repos

TL;DR

This paper provides a comprehensive, formal derivation of RNN and LSTM fundamentals, clarifies unrolling, addresses training challenges, and introduces an extended, most general LSTM variant for improved understanding and implementation.

Contribution

It offers a formal derivation of RNN and LSTM formulas, explains unrolling with proof, and proposes an extended LSTM model with new enhancements.

Findings

01

Formal derivation of RNN from differential equations

02

Proof of RNN unrolling technique

03

Introduction of the most general LSTM variant

Abstract

Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of "unrolling" an RNN is routinely presented without justification throughout the literature. The goal of this paper is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in signal processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory