Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation
Matteo Petten\'o, Alessandro Ilic Mezza, Alberto Bernardini

TL;DR
This paper introduces a novel method using denoising diffusion processes as latent constraints for controllable symbolic music generation, enabling precise attribute control beyond traditional methods.
Contribution
It presents a versatile framework that employs small conditional diffusion models as implicit priors on an unconditional backbone, improving attribute control in symbolic music generation.
Findings
Diffusion-driven constraints outperform traditional attribute regularization.
The approach achieves stronger correlations between target and generated attributes.
High perceptual quality and diversity are maintained in generated music.
Abstract
Recent advances in latent diffusion models have demonstrated state-of-the-art performance in high-dimensional time-series data synthesis while providing flexible control through conditioning and guidance. However, existing methodologies primarily rely on musical context or natural language as the main modality of interacting with the generative process, which may not be ideal for expert users who seek precise fader-like control over specific musical attributes. In this work, we explore the application of denoising diffusion processes as plug-and-play latent constraints for unconditional symbolic music generation models. We focus on a framework that leverages a library of small conditional diffusion models operating as implicit probabilistic priors on the latents of a frozen unconditional backbone. While previous studies have explored domain-specific use cases, this work, to the best of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing
