On the asymptotic rate of convergence of Stochastic Newton algorithms   and their Weighted Averaged versions

Claire Boyer (LPSM); Antoine Godichon-Baggioni (LPSM)

arXiv:2011.09706·math.ST·June 30, 2023·Comput. Optim. Appl.

On the asymptotic rate of convergence of Stochastic Newton algorithms and their Weighted Averaged versions

Claire Boyer (LPSM), Antoine Godichon-Baggioni (LPSM)

PDF

TL;DR

This paper introduces a general stochastic Newton algorithm and its weighted average version for risk minimization in machine learning, demonstrating convergence and efficiency without Hessian inversion, applicable to various models.

Contribution

It generalizes inverse Hessian update techniques, providing convergence analysis and practical algorithms for broad classes of regression models.

Findings

01

Algorithms converge almost surely under mild conditions.

02

Weighted averaging improves convergence stability.

03

Outperforms existing methods in simulations with poor initializations.

Abstract

The majority of machine learning methods can be regarded as the minimization of an unavailable risk function. To optimize the latter, given samples provided in a streaming fashion, we define a general stochastic Newton algorithm and its weighted average version. In several use cases, both implementations will be shown not to require the inversion of a Hessian estimate at each iteration, but a direct update of the estimate of the inverse Hessian instead will be favored. This generalizes a trick introduced in [2] for the specific case of logistic regression, by directly updating the estimate of the inverse Hessian. Under mild assumptions such as local strong convexity at the optimum, we establish almost sure convergences and rates of convergence of the algorithms, as well as central limit theorems for the constructed parameter estimates. The unified framework considered in this paper…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.