CSL-L2M: Controllable Song-Level Lyric-to-Melody Generation Based on Conditional Transformer with Fine-Grained Lyric and Musical Controls
Li Chai, Donglin Wang

TL;DR
CSL-L2M is a novel controllable lyric-to-melody generation framework that uses an in-attention Transformer and multi-level controls to produce high-quality, structured full-song melodies aligned with lyrics and user preferences.
Contribution
The paper introduces REMI-Aligned, a new music representation, and combines multi-level lyric and musical controls within an in-attention Transformer to improve controllability and quality in song-level lyric-to-melody generation.
Findings
Outperforms state-of-the-art models in melody quality and controllability.
Generates well-structured full-song melodies aligned with lyrics.
Enables user control over musical attributes during generation.
Abstract
Lyric-to-melody generation is a highly challenging task in the field of AI music generation. Due to the difficulty of learning strict yet weak correlations between lyrics and melodies, previous methods have suffered from weak controllability, low-quality and poorly structured generation. To address these challenges, we propose CSL-L2M, a controllable song-level lyric-to-melody generation method based on an in-attention Transformer decoder with fine-grained lyric and musical controls, which is able to generate full-song melodies matched with the given lyrics and user-specified musical attributes. Specifically, we first introduce REMI-Aligned, a novel music representation that incorporates strict syllable- and sentence-level alignments between lyrics and melodies, facilitating precise alignment modeling. Subsequently, sentence-level semantic lyric embeddings independently extracted from a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Adam · VQ-VAE · Layer Normalization · Dropout · Position-Wise Feed-Forward Layer · Label Smoothing · Dense Connections · Byte Pair Encoding
