A Progressive Batching L-BFGS Method for Machine Learning

Raghu Bollapragada; Dheevatsa Mudigere; Jorge Nocedal; Hao-Jun Michael; Shi; Ping Tak Peter Tang

arXiv:1802.05374·math.OC·May 31, 2018·58 cites

A Progressive Batching L-BFGS Method for Machine Learning

Raghu Bollapragada, Dheevatsa Mudigere, Jorge Nocedal, Hao-Jun Michael, Shi, Ping Tak Peter Tang

PDF

Open Access

TL;DR

This paper introduces a new progressive batching L-BFGS algorithm that combines stochastic line search and stable quasi-Newton updates, improving large-scale machine learning optimization.

Contribution

It proposes a novel progressive batching L-BFGS method with convergence guarantees, bridging the gap between full batch and stochastic approaches.

Findings

01

Performs well on logistic regression and neural networks

02

Offers convergence guarantees for the proposed method

03

Balances efficiency and stability in large-scale optimization

Abstract

The standard L-BFGS method relies on gradient approximations that are not dominated by noise, so that search directions are descent directions, the line search is reliable, and quasi-Newton updating yields useful quadratic models of the objective function. All of this appears to call for a full batch approach, but since small batch sizes give rise to faster algorithms with better generalization properties, L-BFGS is currently not considered an algorithm of choice for large-scale machine learning applications. One need not, however, choose between the two extremes represented by the full batch or highly stochastic regimes, and may instead follow a progressive batching approach in which the sample size increases during the course of the optimization. In this paper, we present a new version of the L-BFGS algorithm that combines three basic components - progressive batching, a stochastic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms