Improving performance of recurrent neural network with relu nonlinearity

Sachin S. Talathi; Aniket Vartak

arXiv:1511.03771·cs.NE·June 24, 2016·66 cites

Improving performance of recurrent neural network with relu nonlinearity

Sachin S. Talathi, Aniket Vartak

PDF

Open Access

TL;DR

This paper improves recurrent neural network training with ReLU nonlinearities by proposing a modified weight initialization strategy, leading to better performance on sequence learning tasks involving long-range dependencies.

Contribution

It introduces a new weight initialization method for ReLU-based RNNs, enhancing training stability and performance for long-range sequence tasks.

Findings

01

Successful training of ReLU RNNs with the proposed initialization

02

Comparable or improved results on toy sequence problems

03

Effective application to a benchmark action recognition task

Abstract

In recent years significant progress has been made in successfully training recurrent neural networks (RNNs) on sequence learning problems involving long range temporal dependencies. The progress has been made on three fronts: (a) Algorithmic improvements involving sophisticated optimization techniques, (b) network design involving complex hidden layer nodes and specialized recurrent layer connections and (c) weight initialization methods. In this paper, we focus on recently proposed weight initialization with identity matrix for the recurrent weights in a RNN. This initialization is specifically proposed for hidden nodes with Rectified Linear Unit (ReLU) non linearity. We offer a simple dynamical systems perspective on weight initialization process, which allows us to propose a modified weight initialization strategy. We show that this initialization technique leads to successfully…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Neural Networks and Reservoir Computing