Dual Gauss-Newton Directions for Deep Learning

Vincent Roulet; Mathieu Blondel

arXiv:2308.08886·cs.LG·October 30, 2023

Dual Gauss-Newton Directions for Deep Learning

Vincent Roulet, Mathieu Blondel

PDF

Open Access

TL;DR

This paper introduces dual Gauss-Newton directions for deep learning optimization, leveraging the structure of neural networks to create more effective descent directions than traditional stochastic gradients.

Contribution

It proposes a novel dual formulation for Gauss-Newton-like directions, providing computational benefits and improved descent directions in deep learning optimization.

Findings

01

Dual Gauss-Newton directions outperform stochastic gradients in certain settings.

02

The dual formulation offers computational advantages.

03

Empirical results show improved convergence in deep learning tasks.

Abstract

Inspired by Gauss-Newton-like methods, we study the benefit of leveraging the structure of deep learning objectives, namely, the composition of a convex loss function and of a nonlinear network, in order to derive better direction oracles than stochastic gradients, based on the idea of partial linearization. In a departure from previous works, we propose to compute such direction oracles via their dual formulation, leading to both computational benefits and new insights. We demonstrate that the resulting oracles define descent directions that can be used as a drop-in replacement for stochastic gradients, in existing optimization algorithms. We empirically study the advantage of using the dual formulation as well as the computational trade-offs involved in the computation of such oracles.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Stochastic Gradient Optimization Techniques · Gaussian Processes and Bayesian Inference