Frugality in second-order optimization: floating-point approximations for Newton's method

Giuseppe Carrino; Elena Loli Piccolomini; Elisa Riccietti; Theo Mary

arXiv:2511.17660·cs.LG·November 25, 2025

Frugality in second-order optimization: floating-point approximations for Newton's method

Giuseppe Carrino, Elena Loli Piccolomini, Elisa Riccietti, Theo Mary

PDF

Open Access

TL;DR

This paper explores how finite-precision arithmetic affects Newton's method in machine learning, providing convergence guarantees and empirical evidence that mixed-precision and partial second-order methods can outperform first-order optimizers like Adam.

Contribution

It introduces a convergence theorem for mixed-precision Newton methods, including quasi and inexact variants, and proposes GN_k, a partial second-order derivative method that reduces computational cost.

Findings

01

Mixed-precision Newton methods outperform Adam on regression benchmarks.

02

GN_k achieves similar performance to full Newton with fewer derivative evaluations.

03

Theoretical convergence guarantees for finite-precision Newton variants.

Abstract

Minimizing loss functions is central to machine-learning training. Although first-order methods dominate practical applications, higher-order techniques such as Newton's method can deliver greater accuracy and faster convergence, yet are often avoided due to their computational cost. This work analyzes the impact of finite-precision arithmetic on Newton steps and establishes a convergence theorem for mixed-precision Newton optimizers, including "quasi" and "inexact" variants. The theorem provides not only convergence guarantees but also a priori estimates of the achievable solution accuracy. Empirical evaluations on standard regression benchmarks demonstrate that the proposed methods outperform Adam on the Australian and MUSH datasets. The second part of the manuscript introduces GN_k, a generalized Gauss-Newton method that enables partial computation of second-order derivatives. GN_k…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Iterative Methods for Nonlinear Equations · Numerical Methods and Algorithms