Improving performance of recurrent neural network with relu nonlinearity
Sachin S. Talathi, Aniket Vartak

TL;DR
This paper improves recurrent neural network training with ReLU nonlinearities by proposing a modified weight initialization strategy, leading to better performance on sequence learning tasks involving long-range dependencies.
Contribution
It introduces a new weight initialization method for ReLU-based RNNs, enhancing training stability and performance for long-range sequence tasks.
Findings
Successful training of ReLU RNNs with the proposed initialization
Comparable or improved results on toy sequence problems
Effective application to a benchmark action recognition task
Abstract
In recent years significant progress has been made in successfully training recurrent neural networks (RNNs) on sequence learning problems involving long range temporal dependencies. The progress has been made on three fronts: (a) Algorithmic improvements involving sophisticated optimization techniques, (b) network design involving complex hidden layer nodes and specialized recurrent layer connections and (c) weight initialization methods. In this paper, we focus on recently proposed weight initialization with identity matrix for the recurrent weights in a RNN. This initialization is specifically proposed for hidden nodes with Rectified Linear Unit (ReLU) non linearity. We offer a simple dynamical systems perspective on weight initialization process, which allows us to propose a modified weight initialization strategy. We show that this initialization technique leads to successfully…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Domain Adaptation and Few-Shot Learning · Neural Networks and Reservoir Computing
