Deep Performer: Score-to-Audio Music Performance Synthesis
Hao-Wen Dong, Cong Zhou, Taylor Berg-Kirkpatrick, Julian McAuley

TL;DR
Deep Performer introduces a novel transformer-based system for score-to-audio music performance synthesis, effectively handling polyphony and long notes, and achieves high-quality synthesis comparable to or better than existing models.
Contribution
The paper presents a new system for music synthesis from scores, with techniques for polyphony and fine-grained conditioning, and introduces a new violin dataset for training.
Findings
Synthesizes music with clear polyphony and harmonic structures.
Achieves competitive quality in pitch, timbre, and noise levels.
Outperforms baseline models on a piano dataset.
Abstract
Music performance synthesis aims to synthesize a musical score into a natural performance. In this paper, we borrow recent advances in text-to-speech synthesis and present the Deep Performer -- a novel system for score-to-audio music performance synthesis. Unlike speech, music often contains polyphony and long notes. Hence, we propose two new techniques for handling polyphonic inputs and providing a fine-grained conditioning in a transformer encoder-decoder model. To train our proposed system, we present a new violin dataset consisting of paired recordings and scores along with estimated alignments between them. We show that our proposed model can synthesize music with clear polyphony and harmonic structures. In a listening test, we achieve competitive quality against the baseline model, a conditional generative audio model, in terms of pitch accuracy, timbre and noise level. Moreover,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
MethodsFast Attention Via Positive Orthogonal Random Features · Performer
