A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Quoc V. Le; Navdeep Jaitly; Geoffrey E. Hinton

arXiv:1504.00941·cs.NE·April 9, 2015·555 cites

A Simple Way to Initialize Recurrent Networks of Rectified Linear Units

Quoc V. Le, Navdeep Jaitly, Geoffrey E. Hinton

PDF

Open Access 5 Repos

TL;DR

This paper introduces a straightforward initialization method for recurrent neural networks with ReLU units, using identity matrices, which achieves performance comparable to LSTM on various benchmarks.

Contribution

The paper proposes a simple initialization technique for ReLU-based RNNs that enhances learning of long-term dependencies without complex architectures.

Findings

01

Comparable performance to LSTM on four benchmarks

02

Effective in long-range temporal tasks

03

Simplifies recurrent network training

Abstract

Learning long term dependencies in recurrent networks is difficult due to vanishing and exploding gradients. To overcome this difficulty, researchers have developed sophisticated optimization techniques and network architectures. In this paper, we propose a simpler solution that use recurrent neural networks composed of rectified linear units. Key to our solution is the use of the identity matrix or its scaled version to initialize the recurrent weight matrix. We find that our solution is comparable to LSTM on our four benchmarks: two toy problems involving long-range temporal structures, a large language modeling problem and a benchmark speech recognition problem.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Speech Recognition and Synthesis · Natural Language Processing Techniques

MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory