LakhNES: Improving multi-instrumental music generation with cross-domain pre-training
Chris Donahue, Huanru Henry Mao, Yiting Ethan Li, Garrison W., Cottrell, Julian McAuley

TL;DR
This paper introduces LakhNES, a multi-instrumental music generation model using Transformers trained on NES and Lakh MIDI datasets, demonstrating improved performance through cross-domain pre-training.
Contribution
It adapts Transformer models for multi-instrumental music generation and proposes a novel cross-domain pre-training technique leveraging heterogeneous music datasets.
Findings
Pre-training on Lakh MIDI improves generation quality.
Transformers effectively model multi-instrumental sequences.
Cross-domain transfer learning enhances performance.
Abstract
We are interested in the task of generating multi-instrumental music scores. The Transformer architecture has recently shown great promise for the task of piano score generation; here we adapt it to the multi-instrumental setting. Transformers are complex, high-dimensional language models which are capable of capturing long-term structure in sequence data, but require large amounts of data to fit. Their success on piano score generation is partially explained by the large volumes of symbolic data readily available for that domain. We leverage the recently-introduced NES-MDB dataset of four-instrument scores from an early video game sound synthesis chip (the NES), which we find to be well-suited to training with the Transformer architecture. To further improve the performance of our model, we propose a pre-training technique to leverage the information in a large collection of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
MethodsLinear Layer · Absolute Position Encodings · Position-Wise Feed-Forward Layer · Residual Connection · Byte Pair Encoding · Dense Connections · Label Smoothing · *Communicated@Fast*How Do I Communicate to Expedia? · Adam · Softmax
