Pop2Piano : Pop Audio-based Piano Cover Generation
Jongho Choi, Kyogu Lee

TL;DR
This paper introduces Pop2Piano, a Transformer-based model that generates piano covers directly from pop audio waveforms, utilizing a newly created large paired dataset, advancing automatic pop cover generation.
Contribution
We developed a large synchronized dataset and a novel Transformer model for direct pop-to-piano cover generation without melody or chord extraction.
Findings
Pop2Piano produces plausible piano covers from pop audio.
The dataset enables data-driven training for cover generation.
First model to generate piano covers directly from raw pop audio.
Abstract
Piano covers of pop music are enjoyed by many people. However, the task of automatically generating piano covers of pop music is still understudied. This is partly due to the lack of synchronized {Pop, Piano Cover} data pairs, which made it challenging to apply the latest data-intensive deep learning-based methods. To leverage the power of the data-driven approach, we make a large amount of paired and synchronized {Pop, Piano Cover} data using an automated pipeline. In this paper, we present Pop2Piano, a Transformer network that generates piano covers given waveforms of pop music. To the best of our knowledge, this is the first model to generate a piano cover directly from pop audio without using melody and chord extraction modules. We show that Pop2Piano, trained with our dataset, is capable of producing plausible piano covers.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Softmax · Adam · Position-Wise Feed-Forward Layer · Dense Connections · Label Smoothing · Absolute Position Encodings · Layer Normalization
