Sven: Singular Value Descent as a Computationally Efficient Natural Gradient Method
Samuel Bright-Thonney, Thomas R. Harvey, Andre Lukas, Jesse Thaler

TL;DR
Sven is a novel optimization algorithm for neural networks that efficiently approximates natural gradient descent by exploiting data point decompositions, outperforming standard methods in regression tasks.
Contribution
The paper introduces Sven, a scalable natural gradient method using truncated SVD, which improves training speed and convergence over traditional first-order optimizers.
Findings
Sven outperforms Adam in regression tasks, converging faster and to lower loss.
Sven's computational overhead is only a factor of k compared to SGD.
It approximates natural gradient descent in over-parametrized regimes.
Abstract
We introduce Sven (Singular Value dEsceNt), a new optimization algorithm for neural networks that exploits the natural decomposition of loss functions into a sum over individual data points, rather than reducing the full loss to a single scalar before computing a parameter update. Sven treats each data point's residual as a separate condition to be satisfied simultaneously, using the Moore-Penrose pseudoinverse of the loss Jacobian to find the minimum-norm parameter update that best satisfies all conditions at once. In practice, this pseudoinverse is approximated via a truncated singular value decomposition, retaining only the most significant directions and incurring a computational overhead of only a factor of relative to stochastic gradient descent. This is in comparison to traditional natural gradient methods, which scale as the square of the number of parameters. We show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
