LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design

Renjie Wei; Songqiang Xu; Linfeng Zhong; Zebin Yang; Qingyu Guo; Yuan Wang; Runsheng Wang; Meng Li

arXiv:2502.15260·cs.CL·October 13, 2025

LightMamba: Efficient Mamba Acceleration on FPGA with Quantization and Hardware Co-design

Renjie Wei, Songqiang Xu, Linfeng Zhong, Zebin Yang, Qingyu Guo, Yuan Wang, Runsheng Wang, Meng Li

PDF

1 Models

TL;DR

LightMamba introduces a co-designed FPGA-based approach combining quantization and hardware optimization to accelerate Mamba state space models, achieving significant energy efficiency and speed improvements over GPU baselines.

Contribution

It presents a novel FPGA-friendly quantization method and a hardware architecture specifically optimized for Mamba inference, enabling efficient acceleration.

Findings

01

Achieves 4.65x to 6.06x higher energy efficiency than GPU baseline.

02

Reaches 93 tokens/sec on FPGA, 1.43x faster than GPU.

03

Reduces computation to 4-bit using rotation-assisted and power-of-two quantization.

Abstract

State space models (SSMs) like Mamba have recently attracted much attention. Compared to Transformer-based large language models (LLMs), Mamba achieves linear computation complexity with the sequence length and demonstrates superior performance. However, Mamba is hard to accelerate due to the scattered activation outliers and the complex computation dependency, rendering existing LLM accelerators inefficient. In this paper, we propose LightMamba that co-designs the quantization algorithm and FPGA accelerator architecture for efficient Mamba inference. We first propose an FPGA-friendly post-training quantization algorithm that features rotation-assisted quantization and power-of-two SSM quantization to reduce the majority of computation to 4-bit. We further design an FPGA accelerator that partially unrolls the Mamba computation to balance the efficiency and hardware costs. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
PKU-SEC-Lab/LightMamba
model· ♡ 1
♡ 1

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.