Characterizing the Behavior of Training Mamba-based State Space Models on GPUs

Trinayan Baruah; Kaustubh Shivdikar; Sara Prescott; and David Kaeli

arXiv:2508.17679·cs.LG·August 26, 2025

Characterizing the Behavior of Training Mamba-based State Space Models on GPUs

Trinayan Baruah, Kaustubh Shivdikar, Sara Prescott, and David Kaeli

PDF

TL;DR

This paper evaluates the training behavior of Mamba-based State Space Models on GPUs, providing insights into their computational characteristics and implications for GPU architecture optimization.

Contribution

It introduces a workload suite for Mamba-based SSMs and analyzes their GPU performance, highlighting architectural considerations for scaling these models.

Findings

01

Characterized GPU behavior of Mamba-based SSMs during training

02

Identified architectural bottlenecks and optimization opportunities

03

Provided insights for future GPU design to support SSM workloads

Abstract

Mamba-based State Space Models (SSM) have emerged as a promising alternative to the ubiquitous transformers. Despite the expressive power of transformers, the quadratic complexity of computing attention is a major impediment to scaling performance as we increase the sequence length. SSMs provide an alternative path that addresses this problem, reducing the computational complexity requirements of self-attention with novel model architectures for different domains and fields such as video, text generation and graphs. Thus, it is important to characterize the behavior of these emerging workloads on GPUs and understand their requirements during GPU microarchitectural design. In this work we evaluate Mamba-based SSMs and characterize their behavior during training on GPUs. We construct a workload suite that offers representative models that span different model architectures. We then use…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.