A Stochastic Quasi-Newton Method for Large-Scale Optimization

R.H. Byrd; S.L. Hansen; J. Nocedal; Y.Singer

arXiv:1401.7020·math.OC·February 19, 2015·46 cites

A Stochastic Quasi-Newton Method for Large-Scale Optimization

R.H. Byrd, S.L. Hansen, J. Nocedal, Y.Singer

PDF

Open Access

TL;DR

This paper introduces a scalable and robust stochastic quasi-Newton method using limited memory BFGS updates and sub-sampled Hessian-vector products, improving large-scale optimization in machine learning.

Contribution

It proposes a novel stochastic quasi-Newton algorithm that efficiently incorporates curvature information using pointwise Hessian-vector products, enhancing robustness and scalability.

Findings

01

Shows promising results on machine learning problems

02

Demonstrates improved robustness over classical methods

03

Scalable to large datasets

Abstract

The question of how to incorporate curvature information in stochastic approximation methods is challenging. The direct application of classical quasi- Newton updating techniques for deterministic optimization leads to noisy curvature estimates that have harmful effects on the robustness of the iteration. In this paper, we propose a stochastic quasi-Newton method that is efficient, robust and scalable. It employs the classical BFGS update formula in its limited memory form, and is based on the observation that it is beneficial to collect curvature information pointwise, and at regular intervals, through (sub-sampled) Hessian-vector products. This technique differs from the classical approach that would compute differences of gradients, and where controlling the quality of the curvature estimates can be difficult. We present numerical results on problems arising in machine learning that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research