Characterizing the Behavior of Training Mamba-based State Space Models on GPUs
Trinayan Baruah, Kaustubh Shivdikar, Sara Prescott, and David Kaeli

TL;DR
This paper evaluates the training behavior of Mamba-based State Space Models on GPUs, providing insights into their computational characteristics and implications for GPU architecture optimization.
Contribution
It introduces a workload suite for Mamba-based SSMs and analyzes their GPU performance, highlighting architectural considerations for scaling these models.
Findings
Characterized GPU behavior of Mamba-based SSMs during training
Identified architectural bottlenecks and optimization opportunities
Provided insights for future GPU design to support SSM workloads
Abstract
Mamba-based State Space Models (SSM) have emerged as a promising alternative to the ubiquitous transformers. Despite the expressive power of transformers, the quadratic complexity of computing attention is a major impediment to scaling performance as we increase the sequence length. SSMs provide an alternative path that addresses this problem, reducing the computational complexity requirements of self-attention with novel model architectures for different domains and fields such as video, text generation and graphs. Thus, it is important to characterize the behavior of these emerging workloads on GPUs and understand their requirements during GPU microarchitectural design. In this work we evaluate Mamba-based SSMs and characterize their behavior during training on GPUs. We construct a workload suite that offers representative models that span different model architectures. We then use…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
