Theoretical characterisation of the Gauss-Newton conditioning in Neural   Networks

Jim Zhao; Sidak Pal Singh; Aurelien Lucchi

arXiv:2411.02139·cs.LG·February 28, 2025

Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks

Jim Zhao, Sidak Pal Singh, Aurelien Lucchi

PDF

Open Access

TL;DR

This paper provides a theoretical analysis of the Gauss-Newton matrix's conditioning in neural networks, offering bounds and insights into how architecture affects optimization stability.

Contribution

It establishes the first tight bounds on the Gauss-Newton matrix's condition number for deep linear and two-layer ReLU networks, including architectural components.

Findings

01

Bounds on the GN condition number for deep linear networks

02

Extension of analysis to residual and convolutional layers

03

Empirical validation of theoretical bounds

Abstract

The Gauss-Newton (GN) matrix plays an important role in machine learning, most evident in its use as a preconditioning matrix for a wide family of popular adaptive methods to speed up optimization. Besides, it can also provide key insights into the optimization landscape of neural networks. In the context of deep neural networks, understanding the GN matrix involves studying the interaction between different weight matrices as well as the dependencies introduced by the data, thus rendering its analysis challenging. In this work, we take a first step towards theoretically characterizing the conditioning of the GN matrix in neural networks. We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width, which we also extend to two-layer ReLU networks. We expand the analysis to further architectural components, such as residual connections…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications

Methods*Communicated@Fast*How Do I Communicate to Expedia? · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings