Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

Yunlang Zhu; Lingjun Guo; Zahra Khatti; Xiaoyi Qu; Chia-Yuan Wu; Lara Zebiane; Frank E. Curtis

arXiv:2605.06945·math.OC·May 11, 2026

Low-Order Explicit Hessian Imitation Method for Large-Scale Supervised Machine Learning

Yunlang Zhu, Lingjun Guo, Zahra Khatti, Xiaoyi Qu, Chia-Yuan Wu, Lara Zebiane, Frank E. Curtis

PDF

TL;DR

This paper introduces a novel optimization algorithm for neural network training that uses an auxiliary loss to efficiently approximate second-order derivatives, outperforming Adam in certain scenarios.

Contribution

The paper presents a low-order Hessian imitation method utilizing an auxiliary loss to create efficient second-derivative approximations for large-scale supervised learning.

Findings

01

The proposed method provides convergence guarantees similar to existing stochastic diagonal-scaling methods.

02

Numerical experiments show the algorithm can outperform Adam and other optimizers.

03

The approach maintains computational cost comparable to Adam while incorporating second-order information.

Abstract

An algorithm is proposed for solving optimization problems arising in neural network training for supervised learning. The unique feature of the algorithm is the use of an auxiliary loss, in addition to the original loss employed for model training. The purpose of the auxiliary loss is to provide a mechanism for creating a low-order Hessian-type approximation for the original loss. The proposed algorithm employs the resulting low-order second-derivative approximation terms in place of the second-order momentum terms (i.e., squared elements of the gradient of the loss function) in an overall scheme that has computational cost on par with an Adam-type approach. Whereas the squared elements of a gradient vector do not necessarily approximate second-order derivatives well, by careful construction of the auxiliary loss, second-order derivative-type approximations for the original loss can be…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.