Efficient Riemannian Optimization on the Stiefel Manifold via the Cayley Transform
Jun Li, Li Fuxin, Sinisa Todorovic

TL;DR
This paper introduces efficient Riemannian optimization algorithms on the Stiefel manifold using Cayley transforms, significantly reducing computational costs and improving convergence in deep learning models with orthonormal constraints.
Contribution
It proposes a novel Cayley transform-based retraction and vector transport mechanism, enabling faster and more efficient optimization algorithms like Cayley SGD and Cayley ADAM.
Findings
Faster training times for CNNs and RNNs with orthonormal constraints.
Reduced per-iteration computational cost compared to existing methods.
Faster convergence rates without performance loss.
Abstract
Strictly enforcing orthonormality constraints on parameter matrices has been shown advantageous in deep learning. This amounts to Riemannian optimization on the Stiefel manifold, which, however, is computationally expensive. To address this challenge, we present two main contributions: (1) A new efficient retraction map based on an iterative Cayley transform for optimization updates, and (2) An implicit vector transport mechanism based on the combination of a projection of the momentum and the Cayley transform on the Stiefel manifold. We specify two new optimization algorithms: Cayley SGD with momentum, and Cayley ADAM on the Stiefel manifold. Convergence of Cayley SGD is theoretically analyzed. Our experiments for CNN training demonstrate that both algorithms: (a) Use less running time per iteration relative to existing approaches that enforce orthonormality of CNN parameters; and (b)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Generative Adversarial Networks and Image Synthesis · Advanced Neural Network Applications
MethodsAdam · Stochastic Gradient Descent
