On orthogonality and learning recurrent networks with long term dependencies
Eugene Vorontsov, Chiheb Trabelsi, Samuel Kadoury, Chris Pal

TL;DR
This paper investigates how enforcing orthogonality in recurrent neural networks impacts training stability and convergence, proposing a matrix factorization method to control gradient behavior and analyzing the effects of hard constraints.
Contribution
It introduces a novel weight matrix parameterization strategy to control matrix norm bounds and examines the effects of orthogonality constraints on training dynamics.
Findings
Hard orthogonality constraints can slow convergence.
Orthogonality helps stabilize gradients during training.
Controlled matrix norm bounds improve training efficiency.
Abstract
It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce orthogonality. Orthogonal matrices preserve gradient norm during backpropagation and may therefore be a desirable property. This paper explores issues with optimization convergence, speed and gradient stability when encouraging or enforcing orthogonality. To perform this analysis, we propose a weight matrix factorization and parameterization strategy through which we can bound matrix norms and therein control the degree of expansivity induced during backpropagation. We find that hard constraints…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Machine Learning and ELM · Advanced Neural Network Applications
MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
