MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Tan M. Nguyen; Richard G. Baraniuk; Andrea L. Bertozzi; Stanley J.; Osher; Bao Wang

arXiv:2006.06919·cs.LG·December 14, 2021·6 cites

MomentumRNN: Integrating Momentum into Recurrent Neural Networks

Tan M. Nguyen, Richard G. Baraniuk, Andrea L. Bertozzi, Stanley J., Osher, Bao Wang

PDF

Open Access 2 Repos 1 Video

TL;DR

MomentumRNN introduces a novel way to incorporate momentum into recurrent neural networks, improving training stability and performance by addressing vanishing gradients and enabling faster convergence across various RNN architectures.

Contribution

The paper establishes a theoretical connection between RNN dynamics and gradient descent, proposing MomentumRNNs that enhance training and are compatible with advanced optimization methods.

Findings

01

MomentumRNN alleviates vanishing gradient issues.

02

MomentumLSTM improves convergence speed and accuracy.

03

Framework is compatible with various RNN cells and optimization methods.

Abstract

Designing deep neural networks is an art that often involves an expensive search over candidate architectures. To overcome this for recurrent neural nets (RNNs), we establish a connection between the hidden state dynamics in an RNN and gradient descent (GD). We then integrate momentum into this framework and propose a new family of RNNs, called {\em MomentumRNNs}. We theoretically prove and numerically demonstrate that MomentumRNNs alleviate the vanishing gradient issue in training RNNs. We study the momentum long-short term memory (MomentumLSTM) and verify its advantages in convergence speed and accuracy over its LSTM counterpart across a variety of benchmarks. We also demonstrate that MomentumRNN is applicable to many types of recurrent cells, including those in the state-of-the-art orthogonal RNNs. Finally, we show that other advanced momentum-based optimization methods, such as Adam…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

MomentumRNN: Integrating Momentum into Recurrent Neural Networks· slideslive

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Advanced Neural Network Applications · Stochastic Gradient Optimization Techniques

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adam · Sigmoid Activation · Tanh Activation · Long Short-Term Memory