Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation

Longshen Ou; Xichu Ma; Ye Wang

arXiv:2307.02146·cs.CL·December 15, 2025

Joint Learning of Wording and Formatting for Singable Melody-to-Lyric Generation

Longshen Ou, Xichu Ma, Ye Wang

PDF

Open Access

TL;DR

This paper presents a joint learning approach for melody-to-lyric generation that improves singability by incorporating formatting and prosodic patterns, leading to higher quality and structural adherence in generated lyrics.

Contribution

It introduces a novel training framework combining general-domain pretraining, length awareness, and auxiliary supervision based on musicological insights for better lyric generation.

Findings

01

3.8% improvement in line-count adherence

02

21.4% increase in syllable-count accuracy

03

42.2% and 74.2% relative gains in overall quality

Abstract

Despite progress in melody-to-lyric generation, a substantial singability gap remains between machine-generated lyrics and those written by human lyricists. In this work, we aim to narrow this gap by jointly learning both wording and formatting for melody-to-lyric generation. After general-domain pretraining, our model acquires length awareness through an self-supervised stage trained on a large text-only lyric corpus. During supervised melody-to-lyric training, we introduce multiple auxiliary supervision objective informed by musicological findings on melody--lyric relationships, encouraging the model to capture fine-grained prosodic and structural patterns. Compared with na\"ive fine-tuning, our approach improves adherence to line-count and syllable-count requirements by 3.8% and 21.4% absolute, respectively, without degrading text quality. In human evaluation, it achieves 42.2% and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis