Global Structure-Aware Drum Transcription Based on Self-Attention   Mechanisms

Ryoto Ishizuka; Ryo Nishikimi; Kazuyoshi Yoshii

arXiv:2105.05791·cs.SD·May 13, 2021

Global Structure-Aware Drum Transcription Based on Self-Attention Mechanisms

Ryoto Ishizuka, Ryo Nishikimi, Kazuyoshi Yoshii

PDF

Open Access

TL;DR

This paper introduces a global structure-aware drum transcription method using self-attention mechanisms to directly estimate tatum-level scores from music signals, outperforming traditional RNN-based models especially with limited data.

Contribution

It proposes a novel deep model with self-attention and a regularized training approach using a pretrained score language model for improved drum transcription.

Findings

01

Outperforms RNN-based models in tatum-level error rate

02

Effective with limited paired training data

03

Enhances musical naturalness of estimated scores

Abstract

This paper describes an automatic drum transcription (ADT) method that directly estimates a tatum-level drum score from a music signal, in contrast to most conventional ADT methods that estimate the frame-level onset probabilities of drums. To estimate a tatum-level score, we propose a deep transcription model that consists of a frame-level encoder for extracting the latent features from a music signal and a tatum-level decoder for estimating a drum score from the latent features pooled at the tatum level. To capture the global repetitive structure of drum scores, which is difficult to learn with a recurrent neural network (RNN), we introduce a self-attention mechanism with tatum-synchronous positional encoding into the decoder. To mitigate the difficulty of training the self-attention-based model from an insufficient amount of paired data and improve the musical naturalness of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Music Technology and Sound Studies · Speech and Audio Processing