On orthogonality and learning recurrent networks with long term   dependencies

Eugene Vorontsov; Chiheb Trabelsi; Samuel Kadoury; Chris Pal

arXiv:1702.00071·cs.LG·October 13, 2017·116 cites

On orthogonality and learning recurrent networks with long term dependencies

Eugene Vorontsov, Chiheb Trabelsi, Samuel Kadoury, Chris Pal

PDF

Open Access 1 Repo

TL;DR

This paper investigates how enforcing orthogonality in recurrent neural networks impacts training stability and convergence, proposing a matrix factorization method to control gradient behavior and analyzing the effects of hard constraints.

Contribution

It introduces a novel weight matrix parameterization strategy to control matrix norm bounds and examines the effects of orthogonality constraints on training dynamics.

Findings

01

Hard orthogonality constraints can slow convergence.

02

Orthogonality helps stabilize gradients during training.

03

Controlled matrix norm bounds improve training efficiency.

Abstract

It is well known that it is challenging to train deep neural networks and recurrent neural networks for tasks that exhibit long term dependencies. The vanishing or exploding gradient problem is a well known issue associated with these challenges. One approach to addressing vanishing and exploding gradients is to use either soft or hard constraints on weight matrices so as to encourage or enforce orthogonality. Orthogonal matrices preserve gradient norm during backpropagation and may therefore be a desirable property. This paper explores issues with optimization convergence, speed and gradient stability when encouraging or enforcing orthogonality. To perform this analysis, we propose a weight matrix factorization and parameterization strategy through which we can bound matrix norms and therein control the degree of expansivity induced during backpropagation. We find that hard constraints…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

veugene/spectre_release
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Machine Learning and ELM · Advanced Neural Network Applications

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings