Attamba: Attending To Multi-Token States

Yash Akhauri; Safeen Huda; Mohamed S. Abdelfattah

arXiv:2411.17685·cs.LG·November 27, 2024

Attamba: Attending To Multi-Token States

Yash Akhauri, Safeen Huda, Mohamed S. Abdelfattah

PDF

Open Access 1 Repo

TL;DR

Attamba introduces a novel architecture combining state-space models with transformers to efficiently compress token chunks, achieving significant improvements in perplexity and computational efficiency while maintaining flexible sequence handling.

Contribution

The paper presents Attamba, a new model that integrates state-space models into transformers for improved efficiency and flexibility in sequence processing.

Findings

01

24% improved perplexity with similar KV-Cache and attention footprint

02

Approximately 4 times smaller KV-Cache and Attention FLOPs for 5% perplexity trade-off

03

Enables smooth transition between quadratic and linear scaling in attention complexity

Abstract

When predicting the next token in a sequence, vanilla transformers compute attention over all previous tokens, resulting in quadratic scaling of compute with sequence length. State-space models compress the entire sequence of tokens into a fixed-dimensional representation to improve efficiency, while other architectures achieve sub-quadratic complexity via low-rank projections or sparse attention patterns over the sequence. In this paper, we introduce Attamba, a novel architecture that uses state-space models to compress chunks of tokens and applies attention on these compressed key-value representations. We find that replacing key and value projections in a transformer with SSMs can improve model quality and enable flexible token chunking, resulting in 24% improved perplexity with transformer of similar KV-Cache and attention footprint, and ~4 times smaller KV-Cache and Attention FLOPs…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

abdelfattah-lab/attamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Path Planning Algorithms

MethodsSoftmax · Attention Is All You Need