Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation

Matteo Petten\'o; Alessandro Ilic Mezza; Alberto Bernardini

arXiv:2511.07156·cs.LG·November 11, 2025

Conditional Diffusion as Latent Constraints for Controllable Symbolic Music Generation

Matteo Petten\'o, Alessandro Ilic Mezza, Alberto Bernardini

PDF

Open Access

TL;DR

This paper introduces a novel method using denoising diffusion processes as latent constraints for controllable symbolic music generation, enabling precise attribute control beyond traditional methods.

Contribution

It presents a versatile framework that employs small conditional diffusion models as implicit priors on an unconditional backbone, improving attribute control in symbolic music generation.

Findings

01

Diffusion-driven constraints outperform traditional attribute regularization.

02

The approach achieves stronger correlations between target and generated attributes.

03

High perceptual quality and diversity are maintained in generated music.

Abstract

Recent advances in latent diffusion models have demonstrated state-of-the-art performance in high-dimensional time-series data synthesis while providing flexible control through conditioning and guidance. However, existing methodologies primarily rely on musical context or natural language as the main modality of interacting with the generative process, which may not be ideal for expert users who seek precise fader-like control over specific musical attributes. In this work, we explore the application of denoising diffusion processes as plug-and-play latent constraints for unconditional symbolic music generation models. We focus on a framework that leverages a library of small conditional diffusion models operating as implicit probabilistic priors on the latents of a frozen unconditional backbone. While previous studies have explored domain-specific use cases, this work, to the best of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis · Music and Audio Processing