Mamba-X: An End-to-End Vision Mamba Accelerator for Edge Computing Devices

Dongho Yoon; Gungyu Lee; Jaewon Chang; Yunjae Lee; Dongjae Lee; Minsoo Rhu

arXiv:2508.02977·cs.AR·August 6, 2025

Mamba-X: An End-to-End Vision Mamba Accelerator for Edge Computing Devices

Dongho Yoon, Gungyu Lee, Jaewon Chang, Yunjae Lee, Dongjae Lee, Minsoo Rhu

PDF

TL;DR

Mamba-X is an end-to-end hardware accelerator designed for Vision Mamba, a model based on state space models, enabling efficient, low-latency computer vision processing on edge devices by optimizing parallelism and memory usage.

Contribution

The paper introduces Mamba-X, a specialized accelerator with a systolic scan array and quantization techniques, tailored for deploying Vision Mamba on edge hardware.

Findings

01

Achieves lower latency and memory consumption compared to traditional transformers.

02

Enables efficient deployment of Vision Mamba on edge devices.

03

Maintains accuracy while reducing hardware resource requirements.

Abstract

Transformers have proven effective in language modeling but are limited by high computational and memory demands that grow quadratically with input sequence length. State space models (SSMs) offer a promising alternative by reducing attention complexity from $O (L^{2})$ to $O (L)$ while also lowering overall memory consumption. Vision Mamba adapts the SSM approach for computer vision tasks, achieving lower latency and memory consumption than traditional transformer models. However, deploying Vision Mamba on edge devices is challenging due to its sequential scan operations, which hinder GPU efficiency. We propose Mamba-X, an end-to-end Vision Mamba accelerator that includes a systolic scan array to maximize parallelism and minimize memory traffic, along with a hybrid, hardware-friendly quantization technique to reduce memory usage and improve hardware efficiency without sacrificing accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.