Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation
Arnulf Jentzen, Adrian Riekert

TL;DR
This paper analyzes the convergence of gradient flows in training neural networks with ReLU activation, proving risk convergence to critical points or zero under various conditions, thus providing theoretical insights into training dynamics.
Contribution
It offers a rigorous convergence analysis of gradient flow differential equations for three-layer ReLU neural networks, including risk convergence results for specific target functions and data distributions.
Findings
Risk of bounded GF trajectories converges to a critical point risk.
Risk converges to zero for small initial risk with 1D affine targets.
Unbounded GF trajectories also converge to zero risk with a single hidden neuron.
Abstract
Gradient descent (GD) type optimization schemes are the standard methods to train artificial neural networks (ANNs) with rectified linear unit (ReLU) activation. Such schemes can be considered as discretizations of gradient flows (GFs) associated to the training of ANNs with ReLU activation and most of the key difficulties in the mathematical convergence analysis of GD type optimization schemes in the training of ANNs with ReLU activation seem to be already present in the dynamics of the corresponding GF differential equations. It is the key subject of this work to analyze such GF differential equations in the training of ANNs with ReLU activation and three layers (one input layer, one hidden layer, and one output layer). In particular, in this article we prove in the case where the target function is possibly multi-dimensional and continuous and in the case where the probability…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Model Reduction and Neural Networks · Machine Learning and ELM
