Global linear convergence of Newton's method without strong-convexity or   Lipschitz gradients

Sai Praneeth Karimireddy; Sebastian U. Stich; Martin Jaggi

arXiv:1806.00413·cs.LG·June 4, 2018·24 cites

Global linear convergence of Newton's method without strong-convexity or Lipschitz gradients

Sai Praneeth Karimireddy, Sebastian U. Stich, Martin Jaggi

PDF

Open Access

TL;DR

This paper proves that Newton's method achieves global linear convergence for a broad class of functions with stable Hessians, including non-strongly convex problems like logistic regression, even with approximate Hessians and subproblem solutions.

Contribution

It establishes affine-invariant linear convergence of Newton's method without requiring strong convexity or Lipschitz gradients, extending its applicability.

Findings

01

Newton's method converges linearly for functions with stable Hessians.

02

The convergence holds even with approximate Hessians and subproblem solutions.

03

Newton's method outperforms first-order methods under the studied conditions.

Abstract

We show that Newton's method converges globally at a linear rate for objective functions whose Hessians are stable. This class of problems includes many functions which are not strongly convex, such as logistic regression. Our linear convergence result is (i) affine-invariant, and holds even if an (ii) approximate Hessian is used, and if the subproblems are (iii) only solved approximately. Thus we theoretically demonstrate the superiority of Newton's method over first-order methods, which would only achieve a sublinear $O (1/ t^{2})$ rate under similar conditions.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSparse and Compressive Sensing Techniques · Advanced Optimization Algorithms Research · Stochastic Gradient Optimization Techniques