Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Daniel Y. Fu; Tri Dao; Khaled K. Saab; Armin W. Thomas; Atri Rudra,; Christopher R\'e

arXiv:2212.14052·cs.LG·May 2, 2023·117 cites

Hungry Hungry Hippos: Towards Language Modeling with State Space Models

Daniel Y. Fu, Tri Dao, Khaled K. Saab, Armin W. Thomas, Atri Rudra,, Christopher R\'e

PDF

Open Access 3 Repos 5 Models 1 Datasets 1 Video

TL;DR

This paper advances language modeling by developing a new SSM layer called H3 that improves expressivity, and a novel training algorithm FlashConv that enhances efficiency, resulting in models that outperform Transformers on several benchmarks.

Contribution

Introduces H3, a new SSM layer designed for language tasks, and FlashConv, a fast FFT-based algorithm, to close the gap between SSMs and attention-based models.

Findings

01

H3 matches attention on synthetic tasks and approaches Transformer perplexity on OpenWebText.

02

Hybrid H3-attention models outperform Transformers in perplexity on OpenWebText.

03

FlashConv accelerates training and inference, enabling larger models with better performance.

Abstract

State space models (SSMs) have demonstrated state-of-the-art sequence modeling performance in some modalities, but underperform attention in language modeling. Moreover, despite scaling nearly linearly in sequence length instead of quadratically, SSMs are still slower than Transformers due to poor hardware utilization. In this paper, we make progress on understanding the expressivity gap between SSMs and attention in language modeling, and on reducing the hardware barrier between SSMs and attention. First, we use synthetic language modeling tasks to understand the gap between SSMs and attention. We find that existing SSMs struggle with two capabilities: recalling earlier tokens in the sequence and comparing tokens across the sequence. To understand the impact on language modeling, we propose a new SSM layer, H3, that is explicitly designed for these abilities. H3 matches attention on…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Datasets

huaXiaKyrie/up
dataset· 19k dl
19k dl

Videos

Hungry Hungry Hippos: Towards Language Modeling with State Space Models· slideslive

Taxonomy

TopicsTopic Modeling