SDMuse: Stochastic Differential Music Editing and Generation via Hybrid Representation
Chen Zhang, Yi Ren, Kejun Zhang, Shuicheng Yan

TL;DR
SDMuse introduces a novel framework for fine-grained music editing and generation using stochastic differential equations and hybrid representations, enabling versatile manipulation of musical pieces with high quality.
Contribution
The paper presents SDMuse, a unified two-stage model that combines diffusion-based generation and auto-regressive refinement for flexible music editing and creation.
Findings
Effective in various music editing tasks
High-quality music generation demonstrated
Versatile manipulation capabilities confirmed
Abstract
While deep generative models have empowered music generation, it remains a challenging and under-explored problem to edit an existing musical piece at fine granularity. In this paper, we propose SDMuse, a unified Stochastic Differential Music editing and generation framework, which can not only compose a whole musical piece from scratch, but also modify existing musical pieces in many ways, such as combination, continuation, inpainting, and style transferring. The proposed SDMuse follows a two-stage pipeline to achieve music generation and editing on top of a hybrid representation including pianoroll and MIDI-event. In particular, SDMuse first generates/edits pianoroll by iteratively denoising through a stochastic differential equation (SDE) based on a diffusion model generative prior, and then refines the generated pianoroll and predicts MIDI-event tokens auto-regressively. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis
