Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic   Music Generation

Jincheng Zhang; Gy\"orgy Fazekas; Charalampos Saitis

arXiv:2505.03314·cs.SD·May 7, 2025

Mamba-Diffusion Model with Learnable Wavelet for Controllable Symbolic Music Generation

Jincheng Zhang, Gy\"orgy Fazekas, Charalampos Saitis

PDF

Open Access 1 Repo

TL;DR

This paper introduces Mamba-Diffusion, a novel diffusion model with learnable wavelet transforms and Transformer-Mamba blocks, enabling high-quality, controllable symbolic music generation from pianoroll representations.

Contribution

It proposes a new diffusion model tailored for symbolic music using image-like pianorolls, incorporating learnable wavelet transforms and Transformer-Mamba blocks for improved controllability.

Findings

01

Outperforms baseline in pianoroll generation quality

02

Achieves effective control over generated music with target chords

03

Demonstrates promising results in symbolic music synthesis

Abstract

The recent surge in the popularity of diffusion models for image synthesis has attracted new attention to their potential for generation tasks in other domains. However, their applications to symbolic music generation remain largely under-explored because symbolic music is typically represented as sequences of discrete events and standard diffusion models are not well-suited for discrete data. We represent symbolic music as image-like pianorolls, facilitating the use of diffusion models for the generation of symbolic music. Moreover, this study introduces a novel diffusion model that incorporates our proposed Transformer-Mamba block and learnable wavelet transform. Classifier-free guidance is utilised to generate symbolic music with target chords. Our evaluation shows that our method achieves compelling results in terms of music quality and controllability, outperforming the strong…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jinchengzhanggg/proffusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsSoftmax · Attention Is All You Need · Diffusion