Improved Architecture for High-resolution Piano Transcription to   Efficiently Capture Acoustic Characteristics of Music Signals

Jinyi Mi; Sehun Kim; Tomoki Toda

arXiv:2409.19614·cs.SD·October 1, 2024

Improved Architecture for High-resolution Piano Transcription to Efficiently Capture Acoustic Characteristics of Music Signals

Jinyi Mi, Sehun Kim, Tomoki Toda

PDF

Open Access

TL;DR

This paper introduces an improved high-resolution piano transcription model that employs advanced neural architectures and input representations to better capture acoustic features, resulting in higher accuracy and smaller models.

Contribution

The paper proposes novel architectures combining CRNN with dilated convolutions and Transformer decoders, enhancing transcription accuracy and reducing model size.

Findings

01

Achieved consistent improvement in note-level metrics.

02

Developed smaller models with comparable or better performance.

03

Demonstrated effectiveness of Constant-Q Transform input representation.

Abstract

Automatic music transcription (AMT), aiming to convert musical signals into musical notation, is one of the important tasks in music information retrieval. Recently, previous works have applied high-resolution labels, i.e., the continuous onset and offset times of piano notes, as training targets, achieving substantial improvements in transcription performance. However, there still remain some issues to be addressed, e.g., the harmonics of notes are sometimes recognized as false positive notes, and the size of AMT model tends to be larger to improve the transcription performance. To address these issues, we propose an improved high-resolution piano transcription model to well capture specific acoustic characteristics of music signals. First, we employ the Constant-Q Transform as the input representation to better adapt to musical signals. Moreover, we have designed two architectures:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech and Audio Processing · Speech Recognition and Synthesis