Residual Memory Networks: Feed-forward approach to learn long temporal dependencies
Murali Karthick Baskar, Martin Karafiat, Lukas Burget, Karel Vesely,, Frantisek Grezl, Jan Honza Cernocky

TL;DR
This paper introduces Residual Memory Networks (RMN), a feed-forward architecture with residual and delay connections, enabling efficient learning of long-term dependencies and hierarchical information, outperforming traditional RNNs and LSTMs.
Contribution
The paper proposes a novel residual memory neural network architecture that models long-term dependencies using deep feed-forward layers with residual and delay connections, reducing training complexity.
Findings
RMN outperforms LSTM and BLSTM in speech recognition tasks.
RMN achieves 6% relative improvement over LSTM.
BRMN improves accuracy by 3.8% over BLSTM.
Abstract
Training deep recurrent neural network (RNN) architectures is complicated due to the increased network complexity. This disrupts the learning of higher order abstracts using deep RNN. In case of feed-forward networks training deep structures is simple and faster while learning long-term temporal information is not possible. In this paper we propose a residual memory neural network (RMN) architecture to model short-time dependencies using deep feed-forward layers having residual and time delayed connections. The residual connection paves way to construct deeper networks by enabling unhindered flow of gradients and the time delay units capture temporal information with shared weights. The number of layers in RMN signifies both the hierarchical processing depth and temporal depth. The computational complexity in training RMN is significantly less when compared to deep recurrent networks.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAnomaly Detection Techniques and Applications · Digital Media Forensic Detection · Speech Recognition and Synthesis
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
