Music Consistency Models
Zhengcong Fei, Mingyuan Fan, Junshi Huang

TL;DR
Music Consistency Models (MusicCM) introduce an efficient method for high-quality, real-time music synthesis by leveraging consistency distillation and multiple diffusion processes, significantly reducing sampling steps.
Contribution
This work pioneers the application of consistency models to music generation, achieving high fidelity with minimal sampling steps and enabling real-time synthesis.
Findings
MusicCM achieves seamless music synthesis with only four sampling steps.
The model maintains high quality and naturalness in generated music.
Extended coherent music can be generated using multiple diffusion processes.
Abstract
Consistency models have exhibited remarkable capabilities in facilitating efficient image/video generation, enabling synthesis with minimal sampling steps. It has proven to be advantageous in mitigating the computational burdens associated with diffusion models. Nevertheless, the application of consistency models in music generation remains largely unexplored. To address this gap, we present Music Consistency Models (\texttt{MusicCM}), which leverages the concept of consistency models to efficiently synthesize mel-spectrogram for music clips, maintaining high quality while minimizing the number of sampling steps. Building upon existing text-to-music diffusion models, the \texttt{MusicCM} model incorporates consistency distillation and adversarial discriminator training. Moreover, we find it beneficial to generate extended coherent music by incorporating multiple diffusion processes with…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies
MethodsConsistency Models · Diffusion
