LeVo: High-Quality Song Generation with Multi-Preference Alignment

Shun Lei; Yaoxun Xu; Zhiwei Lin; Huaicheng Zhang; Wei Tan; Hangting Chen; Jianwei Yu; Yixuan Zhang; Chenyu Yang; Haina Zhu; Shuai Wang; Zhiyong Wu; Dong Yu

arXiv:2506.07520·cs.SD·October 24, 2025

LeVo: High-Quality Song Generation with Multi-Preference Alignment

Shun Lei, Yaoxun Xu, Zhiwei Lin, Huaicheng Zhang, Wei Tan, Hangting Chen, Jianwei Yu, Yixuan Zhang, Chenyu Yang, Haina Zhu, Shuai Wang, Zhiyong Wu, Dong Yu

PDF

Open Access 1 Repo 4 Models 1 Video

TL;DR

LeVo is a novel framework for high-quality song generation that uses multi-preference alignment and dual-token modeling to improve musicality, vocal-instrument harmony, and instruction following, outperforming existing methods.

Contribution

Introduces LeVo, a new language model framework with dual-token encoding and multi-preference alignment for superior song generation quality.

Findings

01

Outperforms existing open-source methods in objective metrics

02

Achieves competitive results with industry systems

03

Ablation studies confirm the effectiveness of design choices

Abstract

Recent advances in large language models (LLMs) and audio language models have significantly improved music generation, particularly in lyrics-to-song generation. However, existing approaches still struggle with the complex composition of songs and the scarcity of high-quality data, leading to limitations in audio quality, musicality, instruction following, and vocal-instrument harmony. To address these challenges, we introduce LeVo, a language model based framework consisting of LeLM and Music Codec. LeLM is capable of parallel modeling of two types of tokens: mixed tokens, which represent the combined audio of vocals and accompaniment to achieve better vocal-instrument harmony, and dual-track tokens, which separately encode vocals and accompaniment for high-quality song generation. It employs two decoder-only transformers and a modular extension training strategy to prevent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tencent-ailab/songgeneration
pytorchOfficial

Models

Videos

LeVo: High-Quality Song Generation with Multi-Preference Alignment· slideslive

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Generative Adversarial Networks and Image Synthesis

MethodsDirect Preference Optimization