TL;DR
SongBloom is a novel full-length song generation framework that combines autoregressive sketching with diffusion refinement to produce coherent, high-quality music with improved global structure and local fidelity.
Contribution
It introduces an interleaved autoregressive diffusion approach for scalable, coherent song generation, outperforming existing methods in quality and coherence.
Findings
Outperforms existing methods on subjective and objective metrics.
Achieves performance comparable to state-of-the-art commercial platforms.
Effectively balances global coherence with local fidelity in generated music.
Abstract
Generating music with coherent structure, harmonious instrumental and vocal elements remains a significant challenge in song generation. Existing language models and diffusion-based methods often struggle to balance global coherence with local fidelity, resulting in outputs that lack musicality or suffer from incoherent progression and mismatched lyrics. This paper introduces , a novel framework for full-length song generation that leverages an interleaved paradigm of autoregressive sketching and diffusion-based refinement. SongBloom employs an autoregressive diffusion model that combines the high fidelity of diffusion models with the scalability of language models. Specifically, it gradually extends a musical sketch from short to long and refines the details from coarse to fine-grained. The interleaved generation paradigm effectively integrates prior semantic and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
MethodsDiffusion
