Quasi-Newton's method in the class gradient defined high-curvature   subspace

Mark Tuddenham; Adam Pr\"ugel-Bennett; Jonathan Hare

arXiv:2012.01938·cs.LG·April 6, 2021·5 cites

Quasi-Newton's method in the class gradient defined high-curvature subspace

Mark Tuddenham, Adam Pr\"ugel-Bennett, Jonathan Hare

PDF

Open Access

TL;DR

This paper investigates the use of Quasi-Newton's method within the high-curvature subspace of deep learning loss landscapes, revealing challenges and potential strategies for faster optimization.

Contribution

It introduces the concept of applying Quasi-Newton's method specifically in the high-curvature subspace of deep neural network loss landscapes and analyzes its effectiveness.

Findings

01

Naive implementation of Quasi-Newton's method slows convergence.

02

High-curvature subspace corresponds to logit gradients for each class.

03

Potential for combining Newton's method in this subspace with SGD elsewhere.

Abstract

Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes. Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class. An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space. We show that a naive implementation actually slows down convergence and we speculate why this might be.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsIterative Methods for Nonlinear Equations · Advanced Numerical Analysis Techniques · Advanced Optimization Algorithms Research