Language Models are Drummers: Drum Composition with Natural Language Pre-Training
Li Zhang, Chris Callison-Burch

TL;DR
This paper explores transferring knowledge from large pre-trained language models to music generation, demonstrating GPT-3's ability to generate reasonable drum grooves from limited MIDI data, with a new evaluation method for drum groove quality.
Contribution
It introduces a novel approach of applying language model transfer learning to drum music generation and proposes a tailored evaluation method for drum groove quality.
Findings
GPT-3 can generate reasonable drum grooves after fine-tuning.
Pre-trained models outperform non-pre-trained models in drum groove generation.
A new structural evaluation method for drum grooves is proposed.
Abstract
Automatic music generation with artificial intelligence typically requires a large amount of data which is hard to obtain for many less common genres and musical instruments. To tackle this issue, we present ongoing work and preliminary findings on the possibility for deep models to transfer knowledge from language to music, by finetuning large language models pre-trained on a massive text corpus on only hundreds of MIDI files of drum performances. We show that by doing so, one of the largest, state-of-the-art models (GPT3) is capable of generating reasonable drum grooves, while models that are not pre-trained (Transformer) shows no such ability beyond naive repetition. Evaluating generated music is a challenging task, more so is evaluating drum grooves with little precedence in literature. Hence, we propose a tailored structural evaluation method and analyze drum grooves produced by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
