XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units
Arghadip Das, Arnab Raha, Shamik Kundu, Soumendu Kumar Ghosh, Deepak, Mathaikutty, and Vijay Raghunathan

TL;DR
XAMBA is a framework that enables and optimizes State-Space Models on existing neural processing units, significantly improving their speed and efficiency for resource-constrained devices in various AI applications.
Contribution
It introduces a comprehensive methodology to enable, optimize, and trade off accuracy for performance of SSMs on commercial off-the-shelf NPUs, a first in this domain.
Findings
Achieves up to 4.8X speed-up on Intel AI PC.
Replaces sequential operations with matrix-based computations for efficiency.
Uses piecewise linear approximations to reduce activation function latency.
Abstract
State-Space Models (SSMs) have emerged as efficient alternatives to transformers for sequential data tasks, offering linear or near-linear scalability with sequence length, making them ideal for long-sequence applications in NLP, vision, and edge AI, including real-time transcription, translation, and contextual search. These applications require lightweight, high-performance models for deployment on resource-constrained devices like laptops and PCs. Designing specialized accelerators for every emerging neural network is costly and impractical; instead, optimizing models for existing NPUs in AI PCs provides a scalable solution. To this end, we propose XAMBA, the first framework to enable and optimize SSMs on commercial off-the-shelf (COTS) state-of-the-art (SOTA) NPUs. XAMBA follows a three-step methodology: (1) enabling SSMs on NPUs, (2) optimizing performance to meet KPI requirements,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Fault Detection and Control Systems
MethodsSigmoid Activation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · (FiLe@Against@Claim)How do I file a claim against Expedia?
