Quasi-Newton Optimization Methods For Deep Learning Applications

Jacob Rafati; Roummel F. Marcia

arXiv:1909.01994·cs.LG·September 6, 2019

Quasi-Newton Optimization Methods For Deep Learning Applications

Jacob Rafati, Roummel F. Marcia

PDF

Open Access

TL;DR

This paper introduces L-BFGS quasi-Newton optimization methods with line search and trust-region strategies for deep learning, demonstrating faster convergence and better generalization compared to traditional first-order methods.

Contribution

It proposes efficient L-BFGS-based optimization algorithms tailored for deep learning, combining second-order benefits with computational practicality, and provides convergence analysis and empirical validation.

Findings

01

Achieves robust convergence in deep learning tasks.

02

Demonstrates faster training times than SGD.

03

Shows improved generalization in experiments.

Abstract

Deep learning algorithms often require solving a highly non-linear and nonconvex unconstrained optimization problem. Methods for solving optimization problems in large-scale machine learning, such as deep learning and deep reinforcement learning (RL), are generally restricted to the class of first-order algorithms, like stochastic gradient descent (SGD). While SGD iterates are inexpensive to compute, they have slow theoretical convergence rates. Furthermore, they require exhaustive trial-and-error to fine-tune many learning parameters. Using second-order curvature information to find search directions can help with more robust convergence for non-convex optimization problems. However, computing Hessian matrices for large-scale problems is not computationally practical. Alternatively, quasi-Newton methods construct an approximate of the Hessian matrix to build a quadratic model of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · 3D Shape Modeling and Analysis

MethodsStochastic Gradient Descent