Unitary Evolution Recurrent Neural Networks
Martin Arjovsky, Amar Shah, Yoshua Bengio

TL;DR
This paper introduces a novel RNN architecture with unitary weight matrices to effectively learn long-term dependencies, overcoming training difficulties associated with eigenvalue constraints and gradient issues.
Contribution
It proposes a new parametrization of unitary matrices for RNNs that is computationally efficient and effective for learning long-term dependencies.
Findings
Achieved state-of-the-art results on tasks with long-term dependencies.
Demonstrated feasibility of complex domain optimization for RNN training.
Provided a scalable method for unitary matrix parametrization.
Abstract
Recurrent neural networks (RNNs) are notoriously difficult to train. When the eigenvalues of the hidden to hidden weight matrix deviate from absolute value 1, optimization becomes difficult due to the well studied issue of vanishing and exploding gradients, especially when trying to learn long-term dependencies. To circumvent this problem, we propose a new architecture that learns a unitary weight matrix, with eigenvalues of absolute value exactly 1. The challenge we address is that of parametrizing unitary matrices in a way that does not require expensive computations (such as eigendecomposition) after each weight update. We construct an expressive unitary weight matrix by composing several structured matrices that act as building blocks with parameters to be learned. Optimization with this parameterization becomes feasible only when considering hidden states in the complex domain. We…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Topic Modeling · Machine Learning and ELM
MethodsmodReLU · RMSProp · Unitary RNN
