Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao, Sidak Pal Singh, Aurelien Lucchi

TL;DR
This paper provides a theoretical analysis of the Gauss-Newton matrix's conditioning in neural networks, offering bounds and insights into how architecture affects optimization stability.
Contribution
It establishes the first tight bounds on the Gauss-Newton matrix's condition number for deep linear and two-layer ReLU networks, including architectural components.
Findings
Bounds on the GN condition number for deep linear networks
Extension of analysis to residual and convolutional layers
Empirical validation of theoretical bounds
Abstract
The Gauss-Newton (GN) matrix plays an important role in machine learning, most evident in its use as a preconditioning matrix for a wide family of popular adaptive methods to speed up optimization. Besides, it can also provide key insights into the optimization landscape of neural networks. In the context of deep neural networks, understanding the GN matrix involves studying the interaction between different weight matrices as well as the dependencies introduced by the data, thus rendering its analysis challenging. In this work, we take a first step towards theoretically characterizing the conditioning of the GN matrix in neural networks. We establish tight bounds on the condition number of the GN in deep linear networks of arbitrary depth and width, which we also extend to two-layer ReLU networks. We expand the analysis to further architectural components, such as residual connections…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications
Methods*Communicated@Fast*How Do I Communicate to Expedia? · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings
