Deep Performer: Score-to-Audio Music Performance Synthesis

Hao-Wen Dong; Cong Zhou; Taylor Berg-Kirkpatrick; Julian McAuley

arXiv:2202.06034·cs.SD·February 22, 2022

Deep Performer: Score-to-Audio Music Performance Synthesis

Hao-Wen Dong, Cong Zhou, Taylor Berg-Kirkpatrick, Julian McAuley

PDF

Open Access

TL;DR

Deep Performer introduces a novel transformer-based system for score-to-audio music performance synthesis, effectively handling polyphony and long notes, and achieves high-quality synthesis comparable to or better than existing models.

Contribution

The paper presents a new system for music synthesis from scores, with techniques for polyphony and fine-grained conditioning, and introduces a new violin dataset for training.

Findings

01

Synthesizes music with clear polyphony and harmonic structures.

02

Achieves competitive quality in pitch, timbre, and noise levels.

03

Outperforms baseline models on a piano dataset.

Abstract

Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing a fine-grained conditioning in a transformer encoder-decoder model. To train our proposed system, we present a new violin dataset consisting of paired recordings and scores along with estimated alignments between them. We show that our proposed model can synthesize music with clear polyphony and harmonic structures. In a listening test, we achieve competitive quality against the baseline model, a conditional generative audio model, in terms of pitch accuracy, timbre and noise level. Moreover,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing

MethodsFast Attention Via Positive Orthogonal Random Features · Performer