On the existence of infinitely many realization functions of non-global local minima in the training of artificial neural networks with ReLU activation
Shokhrukh Ibragimov, Arnulf Jentzen, Timo Kr\"oger, Adrian, Riekert

TL;DR
This paper demonstrates that in training shallow ReLU neural networks, there can be infinitely many local minima with risks exceeding the global minimum, depending on the target function, highlighting complex landscape structures.
Contribution
It proves the existence of infinitely many realization functions of non-global local minima in shallow ReLU networks for certain target functions, advancing understanding of critical point structures.
Findings
Uncountably many local minima with higher risk exist for certain target functions.
Finitely many critical point realization functions occur for single-neuron networks with piecewise polynomial targets.
The risk landscape can be highly complex with multiple local minima depending on the target function.
Abstract
Gradient descent (GD) type optimization schemes are the standard instruments to train fully connected feedforward artificial neural networks (ANNs) with rectified linear unit (ReLU) activation and can be considered as temporal discretizations of solutions of gradient flow (GF) differential equations. It has recently been proved that the risk of every bounded GF trajectory converges in the training of ANNs with one hidden layer and ReLU activation to the risk of a critical point. Taking this into account it is one of the key research issues in the mathematical convergence analysis of GF trajectories and GD type optimization schemes, respectively, to study sufficient and necessary conditions for critical points of the risk function and, thereby, to obtain an understanding about the appearance of critical points in dependence of the problem parameters such as the target function. In the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsModel Reduction and Neural Networks · Advanced Numerical Analysis Techniques · Neural Networks and Applications
