Modeling the Rhythm from Lyrics for Melody Generation of Pop Song

Daiyu Zhang; Ju-Chiang Wang; Katerina Kosta; Jordan B. L. Smith,; Shicen Zhou

arXiv:2301.01361·eess.AS·January 5, 2023·1 cites

Modeling the Rhythm from Lyrics for Melody Generation of Pop Song

Daiyu Zhang, Ju-Chiang Wang, Katerina Kosta, Jordan B. L. Smith,, Shicen Zhou

PDF

Open Access

TL;DR

This paper introduces a two-stage computational framework for generating pop song melodies from lyrics, utilizing part-of-speech tags and Transformer architectures to improve rhythm and pitch modeling, achieving high-quality results in Chinese lyric-to-melody tasks.

Contribution

It presents a novel lyric-to-rhythm framework with POS tags and a Transformer-based rhythm-to-melody model, addressing data constraints and multimodality challenges in automatic melody generation.

Findings

01

The framework effectively models rhythm and pitch distributions.

02

Generated melodies are rated as comparable or superior to state-of-the-art.

03

The approach demonstrates success in Chinese lyric-to-melody generation.

Abstract

Creating a pop song melody according to pre-written lyrics is a typical practice for composers. A computational model of how lyrics are set as melodies is important for automatic composition systems, but an end-to-end lyric-to-melody model would require enormous amounts of paired training data. To mitigate the data constraints, we adopt a two-stage approach, dividing the task into lyric-to-rhythm and rhythm-to-melody modules. However, the lyric-to-rhythm task is still challenging due to its multimodality. In this paper, we propose a novel lyric-to-rhythm framework that includes part-of-speech tags to achieve better text setting, and a Transformer architecture designed to model long-term syllable-to-note associations. For the rhythm-to-melody task, we adapt a proven chord-conditioned melody Transformer, which has achieved state-of-the-art results. Experiments for Chinese lyric-to-melody…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Layer Normalization · Softmax · Adam · Byte Pair Encoding · Residual Connection · Label Smoothing · Dropout