TL;DR
This paper introduces Feedback Gradient Descent (FGD), a novel optimization method for deep neural networks that achieves both high efficiency and stability by enforcing orthogonality through a simple discretization of a dynamical system.
Contribution
FGD is the first method to simultaneously ensure efficiency and stability in orthogonal DNN training using a dynamical system approach on the Stiefel manifold.
Findings
FGD outperforms existing methods in accuracy.
FGD demonstrates superior efficiency in training.
FGD provides enhanced stability during optimization.
Abstract
The optimization with orthogonality has been shown useful in training deep neural networks (DNNs). To impose orthogonality on DNNs, both computational efficiency and stability are important. However, existing methods utilizing Riemannian optimization or hard constraints can only ensure stability while those using soft constraints can only improve efficiency. In this paper, we propose a novel method, named Feedback Gradient Descent (FGD), to our knowledge, the first work showing high efficiency and stability simultaneously. FGD induces orthogonality based on the simple yet indispensable Euler discretization of a continuous-time dynamical system on the tangent bundle of the Stiefel manifold. In particular, inspired by a numerical integration method on manifolds called Feedback Integrators, we propose to instantiate it on the tangent bundle of the Stiefel manifold for the first time. In…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
