Non-Local Musical Statistics as Guides for Audio-to-Score Piano Transcription
Kentaro Shibata, Eita Nakamura, Kazuyoshi Yoshii

TL;DR
This paper introduces a piano transcription system combining neural network pitch detection with statistical rhythm analysis, improving global musical characteristic estimation and achieving high accuracy in transcribing polyphonic piano music.
Contribution
It proposes a novel integration of deep learning and non-local statistical features to enhance global musical structure inference in automatic piano transcription.
Findings
Achieved 7.1% transcription error rate on a popular piano dataset.
Attained an 85.6% downbeat F-measure, demonstrating effective rhythm estimation.
Non-local statistics significantly improved global characteristic estimation.
Abstract
We present an automatic piano transcription system that converts polyphonic audio recordings into musical scores. This has been a long-standing problem of music information processing, and recent studies have made remarkable progress in the two main component techniques: multipitch detection and rhythm quantization. Given this situation, we study a method integrating deep-neural-network-based multipitch detection and statistical-model-based rhythm quantization. In the first part, we conducted systematic evaluations and found that while the present method achieved high transcription accuracies at the note level, some global characteristics of music, such as tempo scale, metre (time signature), and bar line positions, were often incorrectly estimated. In the second part, we formulated non-local statistics of pitch and rhythmic contents that are derived from musical knowledge and studied…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
