TL;DR
This paper provides a comprehensive, formal derivation of RNN and LSTM fundamentals, clarifies unrolling, addresses training challenges, and introduces an extended, most general LSTM variant for improved understanding and implementation.
Contribution
It offers a formal derivation of RNN and LSTM formulas, explains unrolling with proof, and proposes an extended LSTM model with new enhancements.
Findings
Formal derivation of RNN from differential equations
Proof of RNN unrolling technique
Introduction of the most general LSTM variant
Abstract
Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of "unrolling" an RNN is routinely presented without justification throughout the literature. The goal of this paper is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in signal processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
