Global Convergence of Over-parameterized Deep Equilibrium Models
Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, Zhouchen Lin

TL;DR
This paper analyzes the training dynamics of over-parameterized deep equilibrium models (DEQs), proving global convergence to optimal solutions under certain conditions and introducing a probabilistic framework for their analysis.
Contribution
It provides the first theoretical proof of global convergence for over-parameterized DEQs and introduces a novel probabilistic analysis framework for infinite-depth models.
Findings
Gradient descent converges to a global optimum at a linear rate.
Existence of a unique equilibrium point during training under mild over-parameterization.
A new probabilistic framework for analyzing infinite-depth weight-tied models.
Abstract
A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis
