Gradient Descent Optimizes Infinite-Depth ReLU Implicit Networks with Linear Widths
Tianxiang Gao, Hongyang Gao

TL;DR
This paper analyzes the convergence of gradient-based methods on infinite-depth ReLU implicit networks, demonstrating linear convergence when the network width scales linearly with the sample size.
Contribution
It introduces a scaling approach to ensure well-posedness and proves convergence of gradient flow and descent in infinite-depth implicit networks with linear width.
Findings
Gradient flow and gradient descent converge to a global minimum.
Convergence is linear under certain width conditions.
Scaling the weight matrix ensures well-posedness during training.
Abstract
Implicit deep learning has recently become popular in the machine learning community since these implicit models can achieve competitive performance with state-of-the-art deep networks while using significantly less memory and computational resources. However, our theoretical understanding of when and how first-order methods such as gradient descent (GD) converge on \textit{nonlinear} implicit networks is limited. Although this type of problem has been studied in standard feed-forward networks, the case of implicit models is still intriguing because implicit networks have \textit{infinitely} many layers. The corresponding equilibrium equation probably admits no or multiple solutions during training. This paper studies the convergence of both gradient flow (GF) and gradient descent for nonlinear ReLU activated implicit networks. To deal with the well-posedness problem, we introduce a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Domain Adaptation and Few-Shot Learning
