Melody-Guided Music Generation
Shaopeng Wei, Manzhen Wei, Haoyu Wang, Yu Zhao, Gang Kou

TL;DR
The paper introduces MG2, a melody-guided text-to-music generation model that outperforms existing models with fewer resources by aligning text, audio, and melody using contrastive pretraining and a retrieval-augmented diffusion process.
Contribution
It proposes a novel contrastive language-music pretraining method and a melody-guided diffusion model for efficient, high-quality text-to-music generation with limited data and parameters.
Findings
MG2 surpasses current open-source models in quality.
Achieves high performance with less than 1/3 parameters of competitors.
Human evaluations confirm practical effectiveness.
Abstract
We present the Melody-Guided Music Generation (MG2) model, a novel approach using melody to guide the text-to-music generation that, despite a simple method and limited resources, achieves excellent performance. Specifically, we first align the text with audio waveforms and their associated melodies using the newly proposed Contrastive Language-Music Pretraining, enabling the learned text representation fused with implicit melody information. Subsequently, we condition the retrieval-augmented diffusion module on both text prompt and retrieved melody. This allows MG2 to generate music that reflects the content of the given text description, meantime keeping the intrinsic harmony under the guidance of explicit melody information. We conducted extensive experiments on two public datasets: MusicCaps and MusicBench. Surprisingly, the experimental results demonstrate that the proposed MG2…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDiverse Music Education Insights · Music Technology and Sound Studies · Music History and Culture
MethodsDiffusion · ALIGN
