Stabilizing Gradients for Deep Neural Networks via Efficient SVD Parameterization
Jiong Zhang, Qi Lei, Inderjit S. Dhillon

TL;DR
This paper introduces an efficient SVD-based parametrization of RNN transition matrices to explicitly control singular values, effectively addressing vanishing and exploding gradients and improving training stability and performance.
Contribution
It proposes a novel spectral parametrization method for RNNs using Householder reflectors, enabling explicit singular value control without losing expressive power.
Findings
Empirically solves vanishing and exploding gradient problems.
Faster convergence and better long-range dependency modeling.
Effective on synthetic and real datasets like MNIST and Penn Tree Bank.
Abstract
Vanishing and exploding gradients are two of the main obstacles in training deep neural networks, especially in capturing long range dependencies in recurrent neural networks~(RNNs). In this paper, we present an efficient parametrization of the transition matrix of an RNN that allows us to stabilize the gradients that arise in its training. Specifically, we parameterize the transition matrix by its singular value decomposition(SVD), which allows us to explicitly track and control its singular values. We attain efficiency by using tools that are common in numerical linear algebra, namely Householder reflectors for representing the orthogonal matrices that arise in the SVD. By explicitly controlling the singular values, our proposed Spectral-RNN method allows us to easily solve the exploding gradient problem and we observe that it empirically solves the vanishing gradient issue to a large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Seismic Imaging and Inversion Techniques · Model Reduction and Neural Networks
MethodsSingular Value Decomposition Parameterization
