Dual Gauss-Newton Directions for Deep Learning
Vincent Roulet, Mathieu Blondel

TL;DR
This paper introduces dual Gauss-Newton directions for deep learning optimization, leveraging the structure of neural networks to create more effective descent directions than traditional stochastic gradients.
Contribution
It proposes a novel dual formulation for Gauss-Newton-like directions, providing computational benefits and improved descent directions in deep learning optimization.
Findings
Dual Gauss-Newton directions outperform stochastic gradients in certain settings.
The dual formulation offers computational advantages.
Empirical results show improved convergence in deep learning tasks.
Abstract
Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization. In a departure from previous works, we propose to compute such direction oracles via their dual formulation, leading to both computational benefits and new insights. We demonstrate that the resulting oracles define descent directions that can be used as a drop-in replacement for stochastic gradients, in existing optimization algorithms. We empirically study the advantage of using the dual formulation as well as the computational trade-offs involved in the computation of such oracles.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference
