Quasi-Newton's method in the class gradient defined high-curvature subspace
Mark Tuddenham, Adam Pr\"ugel-Bennett, Jonathan Hare

TL;DR
This paper investigates the use of Quasi-Newton's method within the high-curvature subspace of deep learning loss landscapes, revealing challenges and potential strategies for faster optimization.
Contribution
It introduces the concept of applying Quasi-Newton's method specifically in the high-curvature subspace of deep neural network loss landscapes and analyzes its effectiveness.
Findings
Naive implementation of Quasi-Newton's method slows convergence.
High-curvature subspace corresponds to logit gradients for each class.
Potential for combining Newton's method in this subspace with SGD elsewhere.
Abstract
Classification problems using deep learning have been shown to have a high-curvature subspace in the loss landscape equal in dimension to the number of classes. Moreover, this subspace corresponds to the subspace spanned by the logit gradients for each class. An obvious strategy to speed up optimisation would be to use Newton's method in the high-curvature subspace and stochastic gradient descent in the co-space. We show that a naive implementation actually slows down convergence and we speculate why this might be.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsIterative Methods for Nonlinear Equations · Advanced Numerical Analysis Techniques · Advanced Optimization Algorithms Research
