Towards Realistic Synthetic Data for Automatic Drum Transcription

Pierfrancesco Melucci; Paolo Merialdo; Taketo Akama

arXiv:2601.09520·cs.SD·January 15, 2026

Towards Realistic Synthetic Data for Automatic Drum Transcription

Pierfrancesco Melucci, Paolo Merialdo, Taketo Akama

PDF

Open Access

TL;DR

This paper presents a semi-supervised approach to create a large, diverse synthetic drum dataset from unlabeled audio, enabling training of a high-performance ADT model that surpasses previous methods.

Contribution

The authors introduce a novel semi-supervised method to automatically curate one-shot drum samples and synthesize training data, reducing reliance on paired datasets and domain gap issues.

Findings

01

Achieved state-of-the-art results on ENST and MDB datasets.

02

Outperformed fully supervised and previous synthetic-data methods.

03

Demonstrated effectiveness of synthetic data for ADT training.

Abstract

Deep learning models define the state-of-the-art in Automatic Drum Transcription (ADT), yet their performance is contingent upon large-scale, paired audio-MIDI datasets, which are scarce. Existing workarounds that use synthetic data often introduce a significant domain gap, as they typically rely on low-fidelity SoundFont libraries that lack acoustic diversity. While high-quality one-shot samples offer a better alternative, they are not available in a standardized, large-scale format suitable for training. This paper introduces a new paradigm for ADT that circumvents the need for paired audio-MIDI training data. Our primary contribution is a semi-supervised method to automatically curate a large and diverse corpus of one-shot drum samples from unlabeled audio sources. We then use this corpus to synthesize a high-quality dataset from MIDI files alone, which we use to train a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech Recognition and Synthesis