Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice   Generation

Ziqian Ning; Shuai Wang; Yuepeng Jiang; Jixun Yao; Lei He; Shifeng; Pan; Jie Ding; Lei Xie

arXiv:2408.15474·eess.AS·August 29, 2024

Drop the beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation

Ziqian Ning, Shuai Wang, Yuepeng Jiang, Jixun Yao, Lei He, Shifeng, Pan, Jie Ding, Lei Xie

PDF

Open Access 1 Datasets 1 Video

TL;DR

This paper introduces Freestyler, a novel system for generating rap vocals directly from lyrics and beats, utilizing language models and neural vocoders, supported by the new RapBank dataset, achieving high-quality, rhythmically aligned outputs.

Contribution

The paper presents the first system for rap vocal generation from lyrics and accompaniment, combining language models, flow matching, and neural vocoders, along with a new rap dataset.

Findings

01

High-quality rap vocal generation with naturalness

02

Strong stylistic and rhythmic alignment with beats

03

Effective zero-shot timbre control

Abstract

Rap, a prominent genre of vocal performance, remains underexplored in vocal generation. General vocal synthesis depends on precise note and duration inputs, requiring users to have related musical knowledge, which limits flexibility. In contrast, rap typically features simpler melodies, with a core focus on a strong rhythmic sense that harmonizes with accompanying beats. In this paper, we propose Freestyler, the first system that generates rapping vocals directly from lyrics and accompaniment inputs. Freestyler utilizes language model-based token generation, followed by a conditional flow matching model to produce spectrograms and a neural vocoder to restore audio. It allows a 3-second prompt to enable zero-shot timbre control. Due to the scarcity of publicly available rap datasets, we also present RapBank, a rap song dataset collected from the internet, alongside a meticulously…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

zqning/RapBank
dataset· 26 dl
26 dl

Videos

Drop the Beat! Freestyler for Accompaniment Conditioned Rapping Voice Generation· underline

Taxonomy

TopicsMusic Technology and Sound Studies · Speech and Audio Processing

MethodsFocus