Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with   Linear Widths

Tianxiang Gao; Hongyang Gao

arXiv:2205.07463·cs.LG·May 17, 2022·1 cites

Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths

Tianxiang Gao, Hongyang Gao

PDF

Open Access

TL;DR

This paper analyzes the convergence of gradient-based methods on infinite-depth ReLU implicit networks, demonstrating linear convergence when the network width scales linearly with the sample size.

Contribution

It introduces a scaling approach to ensure well-posedness and proves convergence of gradient flow and descent in infinite-depth implicit networks with linear width.

Findings

01

Gradient flow and gradient descent converge to a global minimum.

02

Convergence is linear under certain width conditions.

03

Scaling the weight matrix ensures well-posedness during training.

Abstract

Implicit deep learning has recently become popular in the machine learning community since these implicit models can achieve competitive performance with state-of-the-art deep networks while using significantly less memory and computational resources. However, our theoretical understanding of when and how first-order methods such as gradient descent (GD) converge on \textit{nonlinear} implicit networks is limited. Although this type of problem has been studied in standard feed-forward networks, the case of implicit models is still intriguing because implicit networks have \textit{infinitely} many layers. The corresponding equilibrium equation probably admits no or multiple solutions during training. This paper studies the convergence of both gradient flow (GF) and gradient descent for nonlinear ReLU activated implicit networks. To deal with the well-posedness problem, we introduce a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning