Overcoming the vanishing gradient problem in plain recurrent networks

Yuhuang Hu; Adrian Huber; Jithendar Anumula; and Shih-Chii Liu

arXiv:1801.06105·cs.NE·July 8, 2019·83 cites

Overcoming the vanishing gradient problem in plain recurrent networks

Yuhuang Hu, Adrian Huber, Jithendar Anumula, and Shih-Chii Liu

PDF

Open Access

TL;DR

This paper introduces the Recurrent Identity Network (RIN), a plain recurrent network that overcomes the vanishing gradient problem without gating mechanisms, achieving competitive results and faster convergence on sequence tasks.

Contribution

The paper presents RIN, a novel gating-free recurrent network that addresses vanishing gradients and matches the performance of gated models.

Findings

01

RIN outperforms IRNNs and LSTMs on multiple benchmarks.

02

Small RIN models achieve 12-67% higher accuracy on MNIST datasets.

03

RIN reaches state-of-the-art on the bAbI question answering dataset.

Abstract

Plain recurrent networks greatly suffer from the vanishing gradient problem while Gated Neural Networks (GNNs) such as Long-short Term Memory (LSTM) and Gated Recurrent Unit (GRU) deliver promising results in many sequence learning tasks through sophisticated network designs. This paper shows how we can address this problem in a plain recurrent network by analyzing the gating mechanisms in GNNs. We propose a novel network called the Recurrent Identity Network (RIN) which allows a plain recurrent network to overcome the vanishing gradient problem while training very deep models without the use of gates. We compare this model with IRNNs and LSTMs on multiple sequence modeling benchmarks. The RINs demonstrate competitive performance and converge faster in all tasks. Notably, small RIN models produce 12%--67% higher accuracy on the Sequential and Permuted MNIST datasets and reach…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Domain Adaptation and Few-Shot Learning · Multimodal Machine Learning Applications