Symmetric Rank-One Quasi-Newton Methods for Deep Learning Using Cubic Regularization
Aditya Ranganath, Mukesh Singhal, Roummel Marcia

TL;DR
This paper introduces a symmetric rank-one quasi-Newton method with cubic regularization for deep learning, exploiting curvature information and negative curvature directions to improve convergence over traditional first-order methods.
Contribution
It proposes a novel indefinite Hessian approximation technique combined with adaptive cubic regularization for more effective optimization in non-convex deep learning models.
Findings
Outperforms first-order adaptive methods on neural networks
Effectively exploits negative curvature directions
Demonstrates improved convergence in experiments
Abstract
Stochastic gradient descent and other first-order variants, such as Adam and AdaGrad, are commonly used in the field of deep learning due to their computational efficiency and low-storage memory requirements. However, these methods do not exploit curvature information. Consequently, iterates can converge to saddle points or poor local minima. On the other hand, Quasi-Newton methods compute Hessian approximations which exploit this information with a comparable computational budget. Quasi-Newton methods re-use previously computed iterates and gradients to compute a low-rank structured update. The most widely used quasi-Newton update is the L-BFGS, which guarantees a positive semi-definite Hessian approximation, making it suitable in a line search setting. However, the loss functions in DNNs are non-convex, where the Hessian is potentially non-positive definite. In this paper, we propose…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Numerical methods in inverse problems · Advanced Numerical Analysis Techniques
MethodsAdam · AdaGrad
