Efficient Stochastic BFGS methods Inspired by Bayesian Principles
Andr\'e Carlon, Luis Espath, Ra\'ul Tempone

TL;DR
This paper introduces a Bayesian-inspired approach to develop stochastic quasi-Newton methods, specifically stochastic BFGS and L-BFGS, that efficiently learn inverse Hessian approximations in noisy gradient settings.
Contribution
The paper proposes a novel Bayesian inference-based methodology to derive stochastic quasi-Newton methods, enabling effective inverse Hessian approximation with small batch sizes.
Findings
Effective inverse Hessian learning with small batch sizes.
High-dimensional experiments up to 30,720 dimensions show robustness.
Iteration costs of O(d^2) for S-BFGS and O(d) for L-S-BFGS.
Abstract
Quasi-Newton methods are ubiquitous in deterministic local search due to their efficiency and low computational cost. This class of methods uses the history of gradient evaluations to approximate second-order derivatives. However, only noisy gradient observations are accessible in stochastic optimization; thus, deriving quasi-Newton methods in this setting is challenging. Although most existing quasi-Newton methods for stochastic optimization rely on deterministic equations that are modified to circumvent noise, we propose a new approach inspired by Bayesian inference to assimilate noisy gradient information and derive the stochastic counterparts to standard quasi-Newton methods. We focus on the derivations of stochastic BFGS and L-BFGS, but our methodology can also be employed to derive stochastic analogs of other quasi-Newton methods. The resulting stochastic BFGS (S-BFGS) and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
