Deep Equilibrium Models

Shaojie Bai; J. Zico Kolter; Vladlen Koltun

arXiv:1909.01377·cs.LG·October 30, 2019·246 cites

Deep Equilibrium Models

Shaojie Bai, J. Zico Kolter, Vladlen Koltun

PDF

Open Access 5 Repos

TL;DR

Deep equilibrium models (DEQ) offer a memory-efficient way to model sequences by directly finding fixed points of hidden layers, enabling effective training of deep models with constant memory.

Contribution

The paper introduces DEQ, a novel approach that finds equilibrium points in deep sequence models, allowing infinite-depth modeling with constant memory and improved performance.

Findings

01

DEQ often outperforms state-of-the-art models on language tasks.

02

DEQ requires similar computation but less memory than traditional models.

03

Memory reduction can be up to 88% in experiments.

Abstract

We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding. Such a method is equivalent to running an infinite depth (weight-tied) feedforward network, but has the notable advantage that we can analytically backpropagate through the equilibrium point using implicit differentiation. Using this approach, training and prediction in these networks require only constant memory, regardless of the effective "depth" of the network. We demonstrate how DEQs can be applied to two state-of-the-art deep sequence models: self-attention transformers and trellis networks. On large-scale language modeling tasks, such as the WikiText-103 benchmark, we show that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsDeep Equilibrium Models