eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing
Jiyong Kim, Jaeho Lee, Jiahao Lin, Alish Kanani, Miao Sun, Umit Y. Ogras, and Jaehyun Park

TL;DR
eMamba is a hardware acceleration framework that optimizes Mamba models for edge devices, achieving high efficiency and accuracy with significantly reduced resource consumption.
Contribution
This work introduces eMamba, the first end-to-end hardware acceleration framework specifically designed for Mamba models on edge platforms, including novel approximation and NAS techniques.
Findings
eMamba achieves 1.63-19.9× fewer parameters while maintaining accuracy.
It demonstrates 4.95-5.62× lower latency and 2.22-9.95× higher throughput on FPGA and ASIC.
eMamba generalizes well to large-scale natural language tasks with stable perplexity.
Abstract
State Space Model (SSM)-based machine learning architectures have recently gained significant attention for processing sequential data. Mamba, a recent sequence-to-sequence SSM, offers competitive accuracy with superior computational efficiency compared to state-of-the-art transformer models. While this advantage makes Mamba particularly promising for resource-constrained edge devices, no hardware acceleration frameworks are currently optimized for deploying it in such environments. This paper presents eMamba, a comprehensive end-to-end hardware acceleration framework explicitly designed for deploying Mamba models on edge platforms. eMamba maximizes computational efficiency by replacing complex normalization layers with lightweight hardware-aware alternatives and approximating expensive operations, such as SiLU activation and exponentiation, considering the target applications. Then, it…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
