Enabling Factorized Piano Music Modeling and Generation with the MAESTRO   Dataset

Curtis Hawthorne; Andriy Stasyuk; Adam Roberts; Ian Simon; Cheng-Zhi; Anna Huang; Sander Dieleman; Erich Elsen; Jesse Engel; Douglas Eck

arXiv:1810.12247·cs.SD·January 21, 2019·149 cites

Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset

Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi, Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck

PDF

Open Access 4 Repos

TL;DR

This paper introduces Wave2Midi2Wave, a method for generating structured piano music by leveraging a new dataset and models that transcribe, compose, and synthesize audio with coherence across multiple timescales.

Contribution

The paper presents a novel approach combining a new dataset and models for factorized piano music generation, enabling coherent audio synthesis across various timescales.

Findings

01

Achieved coherent music generation from 0.1 ms to 100 s timescales

02

Released the MAESTRO dataset with 172 hours of aligned piano recordings

03

Demonstrated effective transcribing, composing, and synthesizing models

Abstract

Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing