Taipan: Efficient and Expressive State Space Language Models with Selective Attention
Chien Van Nguyen, Huy Huu Nguyen, Thang M. Pham, Ruiyi Zhang, Hanieh, Deilamsalehy, Puneet Mathur, Ryan A. Rossi, Trung Bui, Viet Dac Lai, Franck, Dernoncourt, Thien Huu Nguyen

TL;DR
Taipan is a hybrid language model architecture combining Mamba-2 and Selective Attention Layers to efficiently handle extremely long contexts up to 1 million tokens, balancing performance and computational cost.
Contribution
Introducing Taipan, a novel hybrid model that integrates SSMs with selective attention to improve long-context language modeling efficiency and accuracy.
Findings
Outperforms existing models on long-context tasks
Handles up to 1 million tokens efficiently
Balances memory usage with predictive performance
Abstract
Efficient long-context language modeling remains a significant challenge in Natural Language Processing (NLP). While Transformers dominate language tasks, they struggle with long sequences due to quadratic computational complexity in training and linearly scaling memory costs during inference. Recent State Space Models (SSMs) such as Mamba offer alternatives with constant memory usage, but they underperform in tasks requiring extensive in-context retrieval. We introduce Taipan, a novel hybrid architecture that combines Mamba-2 with Selective Attention Layers (SALs). These SALs identify tokens requiring long-range interactions, remove less important features, and then augment their representations using the attention module. This approach balances Mamba's efficiency with Transformer-like performance in memory-intensive tasks. By constraining the attention budget, Taipan extends accurate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
MethodsSoftmax · Attention Is All You Need · Mamba: Linear-Time Sequence Modeling with Selective State Spaces
