Orthogonal Gated Recurrent Unit with Neumann-Cayley Transformation
Edison Mucllari, Vasily Zadorozhnyy, Cole Pospisil, Duc Nguyen, Qiang, Ye

TL;DR
This paper introduces NC-GRU, an orthogonal gated recurrent unit utilizing a Neumann series-based Cayley transformation, which effectively prevents exploding gradients and improves long-term memory in RNNs, outperforming standard GRU.
Contribution
The paper proposes a novel orthogonal matrix parameterization for GRU using Neumann series-based Cayley transformation, enhancing stability and performance.
Findings
NC-GRU outperforms standard GRU on various tasks.
Orthogonal matrices help prevent exploding gradients.
Neumann-Cayley transformation improves training stability.
Abstract
In recent years, using orthogonal matrices has been shown to be a promising approach in improving Recurrent Neural Networks (RNNs) with training, stability, and convergence, particularly, to control gradients. While Gated Recurrent Unit (GRU) and Long Short Term Memory (LSTM) architectures address the vanishing gradient problem by using a variety of gates and memory cells, they are still prone to the exploding gradient problem. In this work, we analyze the gradients in GRU and propose the usage of orthogonal matrices to prevent exploding gradient problems and enhance long-term memory. We study where to use orthogonal matrices and we propose a Neumann series-based Scaled Cayley transformation for training orthogonal matrices in GRU, which we call Neumann-Cayley Orthogonal GRU, or simply NC-GRU. We present detailed experiments of our model on several synthetic and real-world tasks, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and ELM · Advanced Neural Network Applications · Neural Networks and Applications
MethodsGated Recurrent Unit
