On the existence of minimizers in shallow residual ReLU neural network   optimization landscapes

Steffen Dereich; Arnulf Jentzen; Sebastian Kassing

arXiv:2302.14690·math.OC·November 20, 2024·1 cites

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

PDF

Open Access

TL;DR

This paper proves the existence of minimizers in the loss landscape of shallow residual ReLU neural networks, contrasting prior results that showed non-existence for smooth activations, by extending the search space to include discontinuous functions.

Contribution

It establishes the existence of minimizers for residual ReLU networks in settings where previous work found none, using an extended function space approach.

Findings

01

Minimizers exist in the loss landscape for residual ReLU networks.

02

Extended function space includes discontinuous responses, but minimizers are in the original class.

03

Contrasts with prior results showing non-existence for smooth activations.

Abstract

In this article, we show existence of minimizers in the loss landscape for residual artificial neural networks (ANNs) with multi-dimensional input layer and one hidden layer with ReLU activation. Our work contrasts earlier results in [D. Gallon, A. Jentzen, and F. Lindner, preprint, arXiv:2211.15641, 2022] and [P. Petersen, M. Raslan, and F. Voigtlaender, Found. Comput. Math., 21 (2021), pp. 375-444] which showed that in many situations minimizers do not exist for common smooth activation functions even in the case where the target functions are polynomials. The proof of the existence property makes use of a closure of the search space containing all functions generated by ANNs and additional discontinuous generalized responses. As we will show, the additional generalized responses in this larger space are suboptimal so that the minimum is attained in the original function class.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM