Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations

Alexandru Cr\u{a}ciun; Debarghya Ghoshdastidar

arXiv:2510.24466·math.OC·October 29, 2025

Non-Singularity of the Gradient Descent map for Neural Networks with Piecewise Analytic Activations

Alexandru Cr\u{a}ciun, Debarghya Ghoshdastidar

PDF

TL;DR

This paper proves that the gradient descent map is non-singular for realistic neural networks with piecewise analytic activations, which supports theoretical guarantees for avoiding saddle points and convergence to global minima.

Contribution

It establishes the non-singularity of the GD map for practical neural network architectures with common activation functions, extending prior theoretical results.

Findings

01

GD map is non-singular for almost all step-sizes in realistic networks

02

Supports theoretical guarantees for avoiding saddle points

03

Extends convergence analysis to practical neural network settings

Abstract

The theory of training deep networks has become a central question of modern machine learning and has inspired many practical advancements. In particular, the gradient descent (GD) optimization algorithm has been extensively studied in recent years. A key assumption about GD has appeared in several recent works: the \emph{GD map is non-singular} -- it preserves sets of measure zero under preimages. Crucially, this assumption has been used to prove that GD avoids saddle points and maxima, and to establish the existence of a computable quantity that determines the convergence to global minima (both for GD and stochastic GD). However, the current literature either assumes the non-singularity of the GD map or imposes restrictive assumptions, such as Lipschitz smoothness of the loss (for example, Lipschitzness does not hold for deep ReLU networks with the cross-entropy loss) and restricts…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.