A global convergence theory for deep ReLU implicit networks via   over-parameterization

Tianxiang Gao; Hailiang Liu; Jia Liu; Hridesh Rajan; and Hongyang Gao

arXiv:2110.05645·cs.LG·February 21, 2022·5 cites

A global convergence theory for deep ReLU implicit networks via over-parameterization

Tianxiang Gao, Hailiang Liu, Jia Liu, Hridesh Rajan, and Hongyang Gao

PDF

Open Access 1 Video

TL;DR

This paper establishes a global convergence theory for over-parameterized deep ReLU implicit neural networks, demonstrating that gradient descent converges linearly to a global minimum even with infinitely many layers.

Contribution

It provides the first theoretical guarantee of convergence for implicit neural networks with infinite layers under over-parameterization.

Findings

01

Gradient descent converges linearly to a global minimum.

02

Convergence holds for networks with infinitely many layers.

03

Results apply to over-parameterized ReLU implicit neural networks.

Abstract

Implicit deep learning has received increasing attention recently due to the fact that it generalizes the recursive prediction rules of many commonly used neural network architectures. Its prediction rule is provided implicitly based on the solution of an equilibrium equation. Although a line of recent empirical studies has demonstrated its superior performances, the theoretical understanding of implicit neural networks is limited. In general, the equilibrium equation may not be well-posed during the training. As a result, there is no guarantee that a vanilla (stochastic) gradient descent (SGD) training nonlinear implicit neural networks can converge. This paper fills the gap by analyzing the gradient flow of Rectified Linear Unit (ReLU) activated implicit neural networks. For an $m$ -width implicit neural network with ReLU activation and $n$ training samples, we show that a randomly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A global convergence theory for deep ReLU implicit networks via over-parameterization· slideslive

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning