Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks
Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li

TL;DR
This paper introduces a novel orthogonal weight normalization method for deep neural networks, extending orthogonal matrices to rectangular forms, which stabilizes training and improves performance on image classification benchmarks.
Contribution
It generalizes orthogonal matrices to rectangular forms in neural networks and proposes a new normalization method to optimize over dependent Stiefel manifolds.
Findings
Improved accuracy on CIFAR and ImageNet datasets.
Reduced test error of wide residual networks on CIFAR-100.
Enhanced stability and regularization in deep neural network training.
Abstract
Orthogonal matrix has shown advantages in training Recurrent Neural Networks (RNNs), but such matrix is limited to be square for the hidden-to-hidden transformation in RNNs. In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM). We show that the rectangular orthogonal matrix can stabilize the distribution of network activations and regularize FNNs. We also propose a novel orthogonal weight normalization method to solve OMDSM. Particularly, it constructs orthogonal transformation over proxy parameters to ensure the weight matrix is orthogonal and back-propagates gradient information through the transformation during training. To guarantee stability, we minimize the distortions between proxy parameters and canonical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Human Pose and Action Recognition
MethodsWeight Normalization
