SongCreator: Lyrics-based Universal Song Generation
Shun Lei, Yixuan Zhou, Boshi Tang, Max W. Y. Lam, Feng Liu, Hangyu, Liu, Jingcheng Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

TL;DR
SongCreator is a novel system that generates complete songs with vocals and accompaniment from lyrics, using a dual-sequence language model and attention strategies, achieving state-of-the-art results across multiple tasks.
Contribution
The paper introduces a dual-sequence language model with attention masks for comprehensive song generation from lyrics, enabling editing and control of vocals and accompaniment.
Findings
Achieves state-of-the-art performance on eight song generation tasks.
Surpasses previous methods significantly in lyrics-to-song and lyrics-to-vocals tasks.
Demonstrates controllability of vocals and accompaniment through audio prompts.
Abstract
Music is an integral part of human culture, embodying human intelligence and creativity, of which songs compose an essential part. While various aspects of song generation have been explored by previous works, such as singing voice, vocal composition and instrumental arrangement, etc., generating songs with both vocals and accompaniment given lyrics remains a significant challenge, hindering the application of music generation models in the real world. In this light, we propose SongCreator, a song-generation system designed to tackle this challenge. The model features two novel designs: a meticulously designed dual-sequence language model (DSLM) to capture the information of vocals and accompaniment for song generation, and a series of attention mask strategies for DSLM, which allows our model to understand, generate and edit songs, making it suitable for various songrelated generation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMusic and Audio Processing
MethodsSoftmax · Attention Is All You Need
