eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing

Jiyong Kim; Jaeho Lee; Jiahao Lin; Alish Kanani; Miao Sun; Umit Y. Ogras; and Jaehyun Park

arXiv:2508.10370·cs.LG·August 15, 2025

eMamba: Efficient Acceleration Framework for Mamba Models in Edge Computing

Jiyong Kim, Jaeho Lee, Jiahao Lin, Alish Kanani, Miao Sun, Umit Y. Ogras, and Jaehyun Park

PDF

TL;DR

eMamba is a hardware acceleration framework that optimizes Mamba models for edge devices, achieving high efficiency and accuracy with significantly reduced resource consumption.

Contribution

This work introduces eMamba, the first end-to-end hardware acceleration framework specifically designed for Mamba models on edge platforms, including novel approximation and NAS techniques.

Findings

01

eMamba achieves 1.63-19.9× fewer parameters while maintaining accuracy.

02

It demonstrates 4.95-5.62× lower latency and 2.22-9.95× higher throughput on FPGA and ASIC.

03

eMamba generalizes well to large-scale natural language tasks with stable perplexity.

Abstract

State Space Model (SSM)-based machine learning architectures have recently gained significant attention for processing sequential data. Mamba, a recent sequence-to-sequence SSM, offers competitive accuracy with superior computational efficiency compared to state-of-the-art transformer models. While this advantage makes Mamba particularly promising for resource-constrained edge devices, no hardware acceleration frameworks are currently optimized for deploying it in such environments. This paper presents eMamba, a comprehensive end-to-end hardware acceleration framework explicitly designed for deploying Mamba models on edge platforms. eMamba maximizes computational efficiency by replacing complex normalization layers with lightweight hardware-aware alternatives and approximating expensive operations, such as SiLU activation and exponentiation, considering the target applications. Then, it…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.