Analyzing and Exploiting NARX Recurrent Neural Networks for Long-Term Dependencies
Robert DiPietro, Christian Rupprecht, Nassir Navab, Gregory D. Hager

TL;DR
This paper introduces MIST RNNs, a NARX architecture that directly connects to distant past states, improving long-term dependency learning, efficiency, and gradient flow compared to LSTM and other NARX models.
Contribution
The paper presents MIST RNNs, a novel NARX architecture that enhances long-term dependency modeling and computational efficiency over existing RNNs.
Findings
MIST RNNs show superior vanishing-gradient properties.
They are more computationally efficient than previous NARX RNNs.
They outperform LSTM and Clockwork RNNs on long-term dependency tasks.
Abstract
Recurrent neural networks (RNNs) have achieved state-of-the-art performance on many diverse tasks, from machine translation to surgical activity recognition, yet training RNNs to capture long-term dependencies remains difficult. To date, the vast majority of successful RNN architectures alleviate this problem using nearly-additive connections between states, as introduced by long short-term memory (LSTM). We take an orthogonal approach and introduce MIST RNNs, a NARX RNN architecture that allows direct connections from the very distant past. We show that MIST RNNs 1) exhibit superior vanishing-gradient properties in comparison to LSTM and previously-proposed NARX RNNs; 2) are far more efficient than previously-proposed NARX RNN architectures, requiring even fewer computations than LSTM; and 3) improve performance substantially over LSTM and Clockwork RNNs on tasks requiring very…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Advanced Neural Network Applications
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
