Pop2Piano : Pop Audio-based Piano Cover Generation

Jongho Choi; Kyogu Lee

arXiv:2211.00895·cs.SD·April 4, 2023

Pop2Piano : Pop Audio-based Piano Cover Generation

Jongho Choi, Kyogu Lee

PDF

Open Access 4 Repos 2 Models

TL;DR

This paper introduces Pop2Piano, a Transformer-based model that generates piano covers directly from pop audio waveforms, utilizing a newly created large paired dataset, advancing automatic pop cover generation.

Contribution

We developed a large synchronized dataset and a novel Transformer model for direct pop-to-piano cover generation without melody or chord extraction.

Findings

01

Pop2Piano produces plausible piano covers from pop audio.

02

The dataset enables data-driven training for cover generation.

03

First model to generate piano covers directly from raw pop audio.

Abstract

Piano covers of pop music are enjoyed by many people. However, the task of automatically generating piano covers of pop music is still understudied. This is partly due to the lack of synchronized {Pop, Piano Cover} data pairs, which made it challenging to apply the latest data-intensive deep learning-based methods. To leverage the power of the data-driven approach, we make a large amount of paired and synchronized {Pop, Piano Cover} data using an automated pipeline. In this paper, we present Pop2Piano, a Transformer network that generates piano covers given waveforms of pop music. To the best of our knowledge, this is the first model to generate a piano cover directly from pop audio without using melody and chord extraction modules. We show that Pop2Piano, trained with our dataset, is capable of producing plausible piano covers.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization