Exact and Inexact Subsampled Newton Methods for Optimization

Raghu Bollapragada; Richard Byrd; Jorge Nocedal

arXiv:1609.08502·math.OC·September 28, 2016

Exact and Inexact Subsampled Newton Methods for Optimization

Raghu Bollapragada, Richard Byrd, Jorge Nocedal

PDF

TL;DR

This paper investigates subsampled Newton methods for stochastic optimization, analyzing their convergence, complexity, and practical performance in machine learning tasks like logistic regression.

Contribution

It introduces a superlinear convergence analysis for Newton-like methods with subsampled derivatives and evaluates an inexact Newton approach using conjugate gradient in this context.

Findings

01

Superlinear convergence achieved with proper gradient and Hessian accuracy coordination

02

Complexity analysis of inexact Newton method with Hessian sampling and CG

03

Preliminary results show promising performance on logistic regression tasks

Abstract

The paper studies the solution of stochastic optimization problems in which approximations to the gradient and Hessian are obtained through subsampling. We first consider Newton-like methods that employ these approximations and discuss how to coordinate the accuracy in the gradient and Hessian to yield a superlinear rate of convergence in expectation. The second part of the paper analyzes an inexact Newton method that solves linear systems approximately using the conjugate gradient (CG) method, and that samples the Hessian and not the gradient (the gradient is assumed to be exact). We provide a complexity analysis for this method based on the properties of the CG iteration and the quality of the Hessian approximation, and compare it with a method that employs a stochastic gradient iteration instead of the CG method. We report preliminary numerical results that illustrate the performance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.