Globally Convergent Newton Methods for Ill-conditioned Generalized   Self-concordant Losses

Ulysse Marteau-Ferey (SIERRA; DI-ENS; PSL); Francis Bach (SIERRA,; DI-ENS; PSL); Alessandro Rudi (SIERRA; DI-ENS; PSL)

arXiv:1907.01771·math.OC·November 22, 2019·6 cites

Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses

Ulysse Marteau-Ferey (SIERRA, DI-ENS, PSL), Francis Bach (SIERRA,, DI-ENS, PSL), Alessandro Rudi (SIERRA, DI-ENS, PSL)

PDF

Open Access 2 Repos

TL;DR

This paper introduces globally convergent Newton methods for ill-conditioned generalized self-concordant losses, achieving optimal complexity and generalization bounds in large-scale convex optimization, especially for logistic and softmax regressions.

Contribution

It proposes a new Newton-based scheme with proven global convergence and improved behavior in ill-conditioned problems, extending to non-parametric settings with theoretical guarantees.

Findings

01

Algorithm achieves linear convergence with logarithmic dependence on condition number.

02

Provides an explicit non-parametric algorithm with optimal complexity and no dependence on condition number.

03

First large-scale method with theoretical guarantees for logistic and softmax regression in ill-conditioned settings.

Abstract

In this paper, we study large-scale convex optimization algorithms based on the Newton method applied to regularized generalized self-concordant losses, which include logistic regression and softmax regression. We first prove that our new simple scheme based on a sequence of problems with decreasing regularization parameters is provably globally convergent, that this convergence is linear with a constant factor which scales only logarithmically with the condition number. In the parametric setting, we obtain an algorithm with the same scaling than regular first-order methods but with an improved behavior, in particular in ill-conditioned problems. Second, in the non parametric machine learning setting, we provide an explicit algorithm combining the previous scheme with Nystr{\"o}m projection techniques, and prove that it achieves optimal generalization bounds with a time complexity of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Statistical Methods and Inference · Sparse and Compressive Sensing Techniques

MethodsLogistic Regression · Softmax