Taipan: Efficient and Expressive State Space Language Models with   Selective Attention

Chien Van Nguyen; Huy Huu Nguyen; Thang M. Pham; Ruiyi Zhang; Hanieh; Deilamsalehy; Puneet Mathur; Ryan A. Rossi; Trung Bui; Viet Dac Lai; Franck; Dernoncourt; Thien Huu Nguyen

arXiv:2410.18572·cs.CL·October 25, 2024

Taipan: Efficient and Expressive State Space Language Models with Selective Attention

Chien Van Nguyen, Huy Huu Nguyen, Thang M. Pham, Ruiyi Zhang, Hanieh, Deilamsalehy, Puneet Mathur, Ryan A. Rossi, Trung Bui, Viet Dac Lai, Franck, Dernoncourt, Thien Huu Nguyen

PDF

Open Access

TL;DR

Taipan is a hybrid language model architecture combining Mamba-2 and Selective Attention Layers to efficiently handle extremely long contexts up to 1 million tokens, balancing performance and computational cost.

Contribution

Introducing Taipan, a novel hybrid model that integrates SSMs with selective attention to improve long-context language modeling efficiency and accuracy.

Findings

01

Outperforms existing models on long-context tasks

02

Handles up to 1 million tokens efficiently

03

Balances memory usage with predictive performance

Abstract

Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they underperform in tasks requiring extensive in-context retrieval. We introduce Taipan, a novel hybrid architecture that combines Mamba-2 with Selective Attention Layers (SALs). These SALs identify tokens requiring long-range interactions, remove less important features, and then augment their representations using the attention module. This approach balances Mamba's efficiency with Transformer-like performance in memory-intensive tasks. By constraining the attention budget, Taipan extends accurate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces