Quasi-Newton Optimization Methods For Deep Learning Applications
Jacob Rafati, Roummel F. Marcia

TL;DR
This paper introduces L-BFGS quasi-Newton optimization methods with line search and trust-region strategies for deep learning, demonstrating faster convergence and better generalization compared to traditional first-order methods.
Contribution
It proposes efficient L-BFGS-based optimization algorithms tailored for deep learning, combining second-order benefits with computational practicality, and provides convergence analysis and empirical validation.
Findings
Achieves robust convergence in deep learning tasks.
Demonstrates faster training times than SGD.
Shows improved generalization in experiments.
Abstract
Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compute, they have slow theoretical convergence rates. Furthermore, they require exhaustive trial-and-error to fine-tune many learning parameters. Using second-order curvature information to find search directions can help with more robust convergence for non-convex optimization problems. However, computing Hessian matrices for large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of the Hessian matrix to build a quadratic model of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · 3D Shape Modeling and Analysis
MethodsStochastic Gradient Descent
