XAMBA: Enabling Efficient State Space Models on Resource-Constrained   Neural Processing Units

Arghadip Das; Arnab Raha; Shamik Kundu; Soumendu Kumar Ghosh; Deepak; Mathaikutty; and Vijay Raghunathan

arXiv:2502.06924·cs.LG·April 1, 2025

XAMBA: Enabling Efficient State Space Models on Resource-Constrained Neural Processing Units

Arghadip Das, Arnab Raha, Shamik Kundu, Soumendu Kumar Ghosh, Deepak, Mathaikutty, and Vijay Raghunathan

PDF

Open Access 1 Repo

TL;DR

XAMBA is a framework that enables and optimizes State-Space Models on existing neural processing units, significantly improving their speed and efficiency for resource-constrained devices in various AI applications.

Contribution

It introduces a comprehensive methodology to enable, optimize, and trade off accuracy for performance of SSMs on commercial off-the-shelf NPUs, a first in this domain.

Findings

01

Achieves up to 4.8X speed-up on Intel AI PC.

02

Replaces sequential operations with matrix-based computations for efficiency.

03

Uses piecewise linear approximations to reduce activation function latency.

Abstract

State-Space Models (SSMs) have emerged as efficient alternatives to transformers for sequential data tasks, offering linear or near-linear scalability with sequence length, making them ideal for long-sequence applications in NLP, vision, and edge AI, including real-time transcription, translation, and contextual search. These applications require lightweight, high-performance models for deployment on resource-constrained devices like laptops and PCs. Designing specialized accelerators for every emerging neural network is costly and impractical; instead, optimizing models for existing NPUs in AI PCs provides a scalable solution. To this end, we propose XAMBA, the first framework to enable and optimize SSMs on commercial off-the-shelf (COTS) state-of-the-art (SOTA) NPUs. XAMBA follows a three-step methodology: (1) enabling SSMs on NPUs, (2) optimizing performance to meet KPI requirements,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arghadippurdue/xamba
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Fault Detection and Control Systems

MethodsSigmoid Activation · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · (FiLe@Against@Claim)How do I file a claim against Expedia?