Enabling Factorized Piano Music Modeling and Generation with the MAESTRO Dataset
Curtis Hawthorne, Andriy Stasyuk, Adam Roberts, Ian Simon, Cheng-Zhi, Anna Huang, Sander Dieleman, Erich Elsen, Jesse Engel, Douglas Eck

TL;DR
This paper introduces Wave2Midi2Wave, a method for generating structured piano music by leveraging a new dataset and models that transcribe, compose, and synthesize audio with coherence across multiple timescales.
Contribution
The paper presents a novel approach combining a new dataset and models for factorized piano music generation, enabling coherent audio synthesis across various timescales.
Findings
Achieved coherent music generation from 0.1 ms to 100 s timescales
Released the MAESTRO dataset with 172 hours of aligned piano recordings
Demonstrated effective transcribing, composing, and synthesizing models
Abstract
Generating musical audio directly with neural networks is notoriously difficult because it requires coherently modeling structure at many different timescales. Fortunately, most music is also highly structured and can be represented as discrete note events played on musical instruments. Herein, we show that by using notes as an intermediate representation, we can train a suite of models capable of transcribing, composing, and synthesizing audio waveforms with coherent musical structure on timescales spanning six orders of magnitude (~0.1 ms to ~100 s), a process we call Wave2Midi2Wave. This large advance in the state of the art is enabled by our release of the new MAESTRO (MIDI and Audio Edited for Synchronous TRacks and Organization) dataset, composed of over 172 hours of virtuosic piano performances captured with fine alignment (~3 ms) between note labels and audio waveforms. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing
