Normalized gradient flow optimization in the training of ReLU artificial   neural networks

Simon Eberle; Arnulf Jentzen; Adrian Riekert; Georg Weiss

arXiv:2207.06246·math.OC·July 14, 2022

Normalized gradient flow optimization in the training of ReLU artificial neural networks

Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss

PDF

Open Access

TL;DR

This paper introduces a modified gradient flow approach for training ReLU neural networks by restricting dynamics to a special submanifold, leading to better regularity and boundedness properties in shallow networks.

Contribution

It proposes a new submanifold-based gradient flow method for ReLU ANNs and proves boundedness of trajectories in shallow cases, addressing open problems in standard training.

Findings

01

Gradient flow restricted to a submanifold shows improved regularity.

02

Boundedness of gradient trajectories is proven for shallow networks with Lipschitz targets.

03

Standard gradient flow's boundedness remains an open problem.

Abstract

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry. Roughly speaking, ANNs can be regarded as iterated compositions between affine linear functions and certain fixed nonlinear functions, which are usually multidimensional versions of a one-dimensional so-called activation function. The most popular choice of such a one-dimensional activation function is the rectified linear unit (ReLU) activation function which maps a real number to its positive part $R ∋ x \mapsto max {x, 0} \in R$ . In this article we propose and analyze a modified variant of the standard training procedure of such ReLU ANNs in the sense that we propose to restrict the negative gradient flow dynamics to a large submanifold of the ANN parameter space, which is a strict $C^{\infty}$ -submanifold…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Model Reduction and Neural Networks · Advanced Numerical Analysis Techniques