Global Convergence Rate of Deep Equilibrium Models with General   Activations

Lan V. Truong

arXiv:2302.05797·stat.ML·February 14, 2025

Global Convergence Rate of Deep Equilibrium Models with General Activations

Lan V. Truong

PDF

Open Access

TL;DR

This paper extends the analysis of Deep Equilibrium Models (DEQs) to general activation functions with bounded derivatives, proving global convergence rates similar to ReLU-based DEQs.

Contribution

It introduces a novel approach using Hermite polynomial expansion to analyze DEQs with non-homogeneous activations, establishing their convergence properties.

Findings

01

Global convergence rate holds for general bounded activation functions.

02

New techniques for analyzing non-homogeneous activations.

03

Development of a novel population Gram matrix and dual activation.

Abstract

In a recent paper, Ling et al. investigated the over-parametrized Deep Equilibrium Model (DEQ) with ReLU activation. They proved that the gradient descent converges to a globally optimal solution at a linear convergence rate for the quadratic loss function. This paper shows that this fact still holds for DEQs with any general activation that has bounded first and second derivatives. Since the new activation function is generally non-homogeneous, bounding the least eigenvalue of the Gram matrix of the equilibrium point is particularly challenging. To accomplish this task, we need to create a novel population Gram matrix and develop a new form of dual activation with Hermite polynomial expansion.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMatrix Theory and Algorithms · Advanced Thermodynamics and Statistical Mechanics · Quantum Information and Cryptography