Towards multi-instrument drum transcription
Richard Vogl, Gerhard Widmer, Peter Knees

TL;DR
This paper introduces a large synthetic dataset and neural network models for multi-instrument drum transcription, expanding beyond traditional three-instrument focus to include more standard drum kit instruments.
Contribution
It provides the first large-scale synthetic dataset for multi-instrument drum transcription and evaluates neural network models supporting a broader range of drum instruments.
Findings
Models trained on the synthetic dataset generalize well to real data.
Supporting more instruments improves transcription versatility.
Publicly available trained models facilitate future research.
Abstract
Automatic drum transcription, a subtask of the more general automatic music transcription, deals with extracting drum instrument note onsets from an audio source. Recently, progress in transcription performance has been made using non-negative matrix factorization as well as deep learning methods. However, these works primarily focus on transcribing three drum instruments only: snare drum, bass drum, and hi-hat. Yet, for many applications, the ability to transcribe more drum instruments which make up standard drum kits used in western popular music would be desirable. In this work, convolutional and convolutional recurrent neural networks are trained to transcribe a wider range of drum instruments. First, the shortcomings of publicly available datasets in this context are discussed. To overcome these limitations, a larger synthetic dataset is introduced. Then, methods to train models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis
