Diffusion-based Symbolic Music Generation with Structured State Space Models

Shenghua Yuan; Xing Tang; Jiatao Chen; Tianming Xie; Jing Wang; Bing Shi

arXiv:2507.20128·cs.SD·March 4, 2026

Diffusion-based Symbolic Music Generation with Structured State Space Models

Shenghua Yuan, Xing Tang, Jiatao Chen, Tianming Xie, Jing Wang, Bing Shi

PDF

TL;DR

This paper introduces SMDIM, a diffusion-based symbolic music generation model that combines Structured State Space Models and a novel MFA block to achieve scalable, efficient, and high-quality long-sequence music synthesis.

Contribution

The paper presents a new diffusion architecture integrating SSMs and MFA blocks, enabling efficient long-sequence symbolic music generation with improved quality and scalability.

Findings

01

Outperforms state-of-the-art models in quality and efficiency

02

Achieves near-linear complexity for long sequences

03

Successfully models traditional Chinese folk music datasets

Abstract

Recent advancements in diffusion models have significantly improved symbolic music generation. However, most approaches rely on transformer-based architectures with self-attention mechanisms, which are constrained by quadratic computational complexity, limiting scalability for long sequences. To address this, we propose Symbolic Music Diffusion with Mamba (SMDIM), a novel diffusion-based architecture integrating Structured State Space Models (SSMs) for efficient global context modeling and the Mamba-FeedForward-Attention Block (MFA) for precise local detail preservation. The MFA Block combines the linear complexity of Mamba layers, the non-linear refinement of FeedForward layers, and the fine-grained precision of self-attention mechanisms, achieving a balance between scalability and musical expressiveness. SMDIM achieves near-linear complexity, making it highly efficient for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.