Global Convergence of Over-parameterized Deep Equilibrium Models

Zenan Ling; Xingyu Xie; Qiuhao Wang; Zongpeng Zhang; Zhouchen Lin

arXiv:2205.13814·cs.LG·March 30, 2023·1 cites

Global Convergence of Over-parameterized Deep Equilibrium Models

Zenan Ling, Xingyu Xie, Qiuhao Wang, Zongpeng Zhang, Zhouchen Lin

PDF

Open Access

TL;DR

This paper analyzes the training dynamics of over-parameterized deep equilibrium models (DEQs), proving global convergence to optimal solutions under certain conditions and introducing a probabilistic framework for their analysis.

Contribution

It provides the first theoretical proof of global convergence for over-parameterized DEQs and introduces a novel probabilistic analysis framework for infinite-depth models.

Findings

01

Gradient descent converges to a global optimum at a linear rate.

02

Existence of a unique equilibrium point during training under mild over-parameterization.

03

A new probabilistic framework for analyzing infinite-depth weight-tied models.

Abstract

A deep equilibrium model (DEQ) is implicitly defined through an equilibrium point of an infinite-depth weight-tied model with an input-injection. Instead of infinite computations, it solves an equilibrium point directly with root-finding and computes gradients with implicit differentiation. The training dynamics of over-parameterized DEQs are investigated in this study. By supposing a condition on the initial equilibrium point, we show that the unique equilibrium point always exists during the training process, and the gradient descent is proved to converge to a globally optimal solution at a linear convergence rate for the quadratic loss function. In order to show that the required initial condition is satisfied via mild over-parameterization, we perform a fine-grained analysis on random DEQs. We propose a novel probabilistic framework to overcome the technical difficulty in the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDomain Adaptation and Few-Shot Learning · Model Reduction and Neural Networks · Generative Adversarial Networks and Image Synthesis