Zamba: A Compact 7B SSM Hybrid Model
Paolo Glorioso, Quentin Anthony, Yury Tokpanov, James Whittington,, Jonathan Pilault, Adam Ibrahim, Beren Millidge

TL;DR
Zamba is a compact 7B hybrid model combining SSM and transformer architectures, achieving competitive performance with faster inference and lower memory usage, trained on 1 trillion tokens and open-sourced for community use.
Contribution
It introduces a novel hybrid architecture with a shared attention module, offering efficiency and speed advantages over traditional transformer models at a similar scale.
Findings
Achieves competitive performance with leading models.
Faster inference and lower memory requirements.
Open-sourced weights and checkpoints.
Abstract
In this technical report, we present Zamba, a novel 7B SSM-transformer hybrid model which achieves competitive performance against leading open-weight models at a comparable scale. Zamba is trained on 1T tokens from openly available datasets and is the best non-transformer model at this scale. Zamba pioneers a unique architecture combining a Mamba backbone with a single shared attention module, thus obtaining the benefits of attention at minimal parameter cost. Due to its architecture, Zamba is significantly faster at inference than comparable transformer models and requires substantially less memory for generation of long sequences. Zamba is pretrained in two phases: the first phase is based on existing web datasets, while the second one consists of annealing the model over high-quality instruct and synthetic datasets, and is characterized by a rapid learning rate decay. We open-source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗Zyphra/Zamba-7B-v1model· 3.5k dl· ♡ 293.5k dl♡ 29
- 🤗Zyphra/Zamba-7B-v1-phase1model· 5 dl· ♡ 55 dl♡ 5
- 🤗Zyphra/Zamba2-2.7Bmodel· 2.5k dl· ♡ 792.5k dl♡ 79
- 🤗Zyphra/Zamba2-1.2Bmodel· 2.7k dl· ♡ 752.7k dl♡ 75
- 🤗Zyphra/Zamba2-7Bmodel· 440 dl· ♡ 114440 dl♡ 114
- 🤗adamo1139/Zamba2-7B-ungatedmodel
- 🤗ssmits/Zamba2-1.2Bmodel· 3 dl3 dl
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAgricultural Engineering and Mechanization · Nuclear and radioactivity studies · Advanced Manufacturing and Logistics Optimization
