SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Chenyu Yang; Shuai Wang; Hangting Chen; Wei Tan; Jianwei Yu; Haizhou Li

arXiv:2506.07634·eess.AS·October 23, 2025

SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement

Chenyu Yang, Shuai Wang, Hangting Chen, Wei Tan, Jianwei Yu, Haizhou Li

PDF

1 Repo 2 Models 1 Video

TL;DR

SongBloom is a novel full-length song generation framework that combines autoregressive sketching with diffusion refinement to produce coherent, high-quality music with improved global structure and local fidelity.

Contribution

It introduces an interleaved autoregressive diffusion approach for scalable, coherent song generation, outperforming existing methods in quality and coherence.

Findings

01

Outperforms existing methods on subjective and objective metrics.

02

Achieves performance comparable to state-of-the-art commercial platforms.

03

Effectively balances global coherence with local fidelity in generated music.

Abstract

Generating music with coherent structure, harmonious instrumental and vocal elements remains a significant challenge in song generation. Existing language models and diffusion-based methods often struggle to balance global coherence with local fidelity, resulting in outputs that lack musicality or suffer from incoherent progression and mismatched lyrics. This paper introduces $SongBloom$ , a novel framework for full-length song generation that leverages an interleaved paradigm of autoregressive sketching and diffusion-based refinement. SongBloom employs an autoregressive diffusion model that combines the high fidelity of diffusion models with the scalability of language models. Specifically, it gradually extends a musical sketch from short to long and refines the details from coarse to fine-grained. The interleaved generation paradigm effectively integrates prior semantic and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

cypress-yang/songbloom
pytorchOfficial

Models

Videos

SongBloom: Coherent Song Generation via Interleaved Autoregressive Sketching and Diffusion Refinement· slideslive

Taxonomy

MethodsDiffusion