Deep Equilibrium Models
Shaojie Bai, J. Zico Kolter, Vladlen Koltun

TL;DR
Deep equilibrium models (DEQ) offer a memory-efficient way to model sequences by directly finding fixed points of hidden layers, enabling effective training of deep models with constant memory.
Contribution
The paper introduces DEQ, a novel approach that finds equilibrium points in deep sequence models, allowing infinite-depth modeling with constant memory and improved performance.
Findings
DEQ often outperforms state-of-the-art models on language tasks.
DEQ requires similar computation but less memory than traditional models.
Memory reduction can be up to 88% in experiments.
Abstract
We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding. Such a method is equivalent to running an infinite depth (weight-tied) feedforward network, but has the notable advantage that we can analytically backpropagate through the equilibrium point using implicit differentiation. Using this approach, training and prediction in these networks require only constant memory, regardless of the effective "depth" of the network. We demonstrate how DEQs can be applied to two state-of-the-art deep sequence models: self-attention transformers and trellis networks. On large-scale language modeling tasks, such as the WikiText-103 benchmark, we show that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
MethodsDeep Equilibrium Models
