Gradient descent for deep equilibrium single-index models

Sanjit Dandapanthula; Aaditya Ramdas

arXiv:2511.16976·cs.LG·January 13, 2026

Gradient descent for deep equilibrium single-index models

Sanjit Dandapanthula, Aaditya Ramdas

PDF

Open Access

TL;DR

This paper provides a rigorous theoretical analysis of gradient descent dynamics in deep equilibrium models, focusing on linear and single-index models, and demonstrates convergence properties and training stability.

Contribution

It offers the first theoretical results on gradient descent behavior in DEQs, including convergence and stability analysis in simplified models.

Findings

01

Gradient flow remains well-conditioned during training.

02

Gradient descent converges linearly to a global minimum.

03

Parameters stay on spheres during training.

Abstract

Deep equilibrium models (DEQs) have recently emerged as a powerful paradigm for training infinitely deep weight-tied neural networks that achieve state of the art performance across many modern machine learning tasks. Despite their practical success, theoretically understanding the gradient descent dynamics for training DEQs remains an area of active research. In this work, we rigorously study the gradient descent dynamics for DEQs in the simple setting of linear models and single-index models, filling several gaps in the literature. We prove a conservation law for linear DEQs which implies that the parameters remain trapped on spheres during training and use this property to show that gradient flow remains well-conditioned for all time. We then prove linear convergence of gradient descent to a global minimizer for linear DEQs and deep equilibrium single-index models under appropriate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Advanced Graph Neural Networks · Model Reduction and Neural Networks