Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded   Diffusion Models

Ziyu Wang; Lejun Min; Gus Xia

arXiv:2405.09901·cs.SD·May 17, 2024·1 cites

Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models

Ziyu Wang, Lejun Min, Gus Xia

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces a hierarchical diffusion model for whole-song symbolic music generation, capturing global structure and local details, enabling controllable and high-quality full-piece music synthesis.

Contribution

It presents the first hierarchical modeling approach for full-song generation using cascaded diffusion models conditioned on different levels of musical semantics.

Findings

01

Generated music exhibits recognizable verse-chorus structure.

02

Music quality surpasses baseline models.

03

Model allows flexible control over musical features.

Abstract

Recent deep music generation studies have put much emphasis on long-term generation with structures. However, we are yet to see high-quality, well-structured whole-song generation. In this paper, we make the first attempt to model a full music piece under the realization of compositional hierarchy. With a focus on symbolic representations of pop songs, we define a hierarchical language, in which each level of hierarchy focuses on the semantics and context dependency at a certain music scope. The high-level languages reveal whole-song form, phrase, and cadence, whereas the low-level languages focus on notes, chords, and their local patterns. A cascaded diffusion model is trained to model the hierarchical language, where each level is conditioned on its upper levels. Experiments and analysis show that our model is capable of generating full-piece music with recognizable global…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models· slideslive

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Neuroscience and Music Perception

MethodsFocus · Diffusion