Error whitening: Why Gauss-Newton outperforms Newton

Maricela Best McKay; Nathan P. Lawrence; Brian Wetton; R. Bhushan Gopaluni

arXiv:2605.11316·cs.LG·May 13, 2026

Error whitening: Why Gauss-Newton outperforms Newton

Maricela Best McKay, Nathan P. Lawrence, Brian Wetton, R. Bhushan Gopaluni

PDF

TL;DR

This paper explains why Gauss-Newton methods outperform Newton's method by analyzing their function space dynamics and introducing the concept of error whitening, supported by empirical evidence.

Contribution

The paper provides a function space perspective revealing how Gauss-Newton's error whitening property distinguishes it from Newton's method, with empirical validation across various tasks.

Findings

01

Gauss-Newton projects the loss gradient onto the model's tangent space, removing parameterization distortions.

02

Error whitening replaces the $JJ^\top$ matrix with the identity, simplifying the dynamics.

03

Gauss-Newton optimizers outperform Newton, Adam, and Muon in multiple case studies.

Abstract

The Gauss-Newton matrix is widely viewed as a positive semidefinite approximation of the Hessian, yet mounting empirical evidence shows that Gauss-Newton descent outperforms Newton's method. We adopt a function space perspective to analyze this phenomenon. We show that the generalized Gauss-Newton (GGN) matrix projects the Newton direction in function space onto the model's tangent space, while a Jacobian-only variant obtained by applying the least squares Gauss-Newton matrix to non-least squares losses projects the function space loss gradient onto this same tangent space. Both projections eliminate distortions from the model's parameterization. Specifically, the evolution of the prediction-target mismatch depends on the model's parameterization through the matrix $J J^{⊤}$ where $J$ is the Jacobian of the model with respect to its parameters. The projections effectively replace…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.