Modeling the Rhythm from Lyrics for Melody Generation of Pop Song
Daiyu Zhang, Ju-Chiang Wang, Katerina Kosta, Jordan B. L. Smith,, Shicen Zhou

TL;DR
This paper introduces a two-stage computational framework for generating pop song melodies from lyrics, utilizing part-of-speech tags and Transformer architectures to improve rhythm and pitch modeling, achieving high-quality results in Chinese lyric-to-melody tasks.
Contribution
It presents a novel lyric-to-rhythm framework with POS tags and a Transformer-based rhythm-to-melody model, addressing data constraints and multimodality challenges in automatic melody generation.
Findings
The framework effectively models rhythm and pitch distributions.
Generated melodies are rated as comparable or superior to state-of-the-art.
The approach demonstrates success in Chinese lyric-to-melody generation.
Abstract
Creating a pop song melody according to pre-written lyrics is a typical practice for composers. A computational model of how lyrics are set as melodies is important for automatic composition systems, but an end-to-end lyric-to-melody model would require enormous amounts of paired training data. To mitigate the data constraints, we adopt a two-stage approach, dividing the task into lyric-to-rhythm and rhythm-to-melody modules. However, the lyric-to-rhythm task is still challenging due to its multimodality. In this paper, we propose a novel lyric-to-rhythm framework that includes part-of-speech tags to achieve better text setting, and a Transformer architecture designed to model long-term syllable-to-note associations. For the rhythm-to-melody task, we adapt a proven chord-conditioned melody Transformer, which has achieved state-of-the-art results. Experiments for Chinese lyric-to-melody…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Dropout
