Recurrent Neural Networks for Learning Long-term Temporal Dependencies with Reanalysis of Time Scale Representation
Kentaro Ohno, Atsutoshi Kumagai

TL;DR
This paper reanalyzes the interpretation of forget gates in gated RNNs as time scale representations, generalizes the theory to realistic input scenarios, and proposes a new RNN construction to better learn long-term dependencies, validated by experiments.
Contribution
It provides a more realistic theoretical understanding of forget gates in RNNs and introduces a novel RNN design for improved long-term dependency learning.
Findings
Existing RNNs satisfy the exponential gradient decay condition initially.
The proposed RNNs can represent longer time scales.
Experimental results show improved long-term dependency learning.
Abstract
Recurrent neural networks with a gating mechanism such as an LSTM or GRU are powerful tools to model sequential data. In the mechanism, a forget gate, which was introduced to control information flow in a hidden state in the RNN, has recently been re-interpreted as a representative of the time scale of the state, i.e., a measure how long the RNN retains information on inputs. On the basis of this interpretation, several parameter initialization methods to exploit prior knowledge on temporal dependencies in data have been proposed to improve learnability. However, the interpretation relies on various unrealistic assumptions, such as that there are no inputs after a certain time point. In this work, we reconsider this interpretation of the forget gate in a more realistic setting. We first generalize the existing theory on gated RNNs so that we can consider the case where inputs are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Machine Learning in Healthcare · Data Stream Mining Techniques
MethodsTanh Activation · Sigmoid Activation · Long Short-Term Memory · Gated Recurrent Unit
