On the Long-Term Memory of Deep Recurrent Networks
Yoav Levine, Or Sharir, Alon Ziv, Amnon Shashua

TL;DR
This paper introduces a new measure called Start-End separation rank to quantify long-term memory in deep RNNs, demonstrating that depth significantly enhances their ability to model long-term dependencies.
Contribution
The paper establishes that deep recurrent networks have exponentially higher capacity for long-term dependencies than shallow ones, using the Start-End separation rank measure.
Findings
Deep RNNs support higher Start-End separation ranks than shallow networks.
Depth exponentially increases the ability to model long-term dependencies.
Empirical results confirm theoretical predictions on common RNN architectures.
Abstract
A key attribute that drives the unprecedented success of modern Recurrent Neural Networks (RNNs) on learning tasks which involve sequential data, is their ability to model intricate long-term temporal dependencies. However, a well established measure of RNNs long-term memory capacity is lacking, and thus formal understanding of the effect of depth on their ability to correlate data throughout time is limited. Specifically, existing depth efficiency results on convolutional networks do not suffice in order to account for the success of deep RNNs on data of varying lengths. In order to address this, we introduce a measure of the network's ability to support information flow across time, referred to as the Start-End separation rank, which reflects the distance of the function realized by the recurrent network from modeling no dependency between the beginning and end of the input sequence.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsParallel Computing and Optimization Techniques · Quantum Computing Algorithms and Architecture · Quantum many-body systems
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
