A Distribution Matching Approach to Neural Piano Transcription with Optimal Transport
Weixing Wei, Raynaldi Lalang, Dichucheng Li, Kazuyoshi Yoshii

TL;DR
This paper introduces a novel neural piano transcription method that models the task as an optimal transport problem, improving alignment and performance over traditional classification approaches.
Contribution
The paper proposes a new OT-based loss function and a harmonics-aware CRNN architecture for improved neural piano transcription.
Findings
Achieved state-of-the-art onset detection performance on MAESTRO dataset.
Demonstrated the versatility of OT loss in existing models.
Improved temporal alignment in transcription results.
Abstract
This paper describes a novel paradigm that formalizes automatic piano transcription (APT) as an optimal transport (OT) problem, not as a frame-level multi-label binary classification problem. Our method learns to minimize the cost of transporting a predicted distribution of note events to the ground-truth distribution over time and frequency. The OT loss can thus accommodate temporal misalignment, leading to perceptually relevant optimization. We also propose a convolutional recurrent neural network (CRNN) with a harmonics-aware attention mechanism to capture the spectro-temporal dependencies inherent in music.Our experiments using the MAESTRO dataset showed that our method attained a state-of-the-art performance in onset detection. We confirmed the versatility of the OT loss in application to existing models.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
