Scoring Time Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription
Yujia Yan, Zhiyao Duan

TL;DR
This paper presents a transformer-based approach for scoring time intervals in automatic piano transcription, achieving state-of-the-art accuracy by efficiently modeling interval scores with an encoder-only transformer on low-resolution features.
Contribution
It introduces a simple interval scoring method using scaled inner products inspired by attention, and demonstrates that a non-hierarchical transformer can accurately transcribe piano events from low-resolution features.
Findings
Achieves state-of-the-art F1 scores on the Maestro dataset.
Demonstrates high accuracy and time precision in piano transcription.
Validates theoretical expressiveness of the inner product scoring method.
Abstract
The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription. In this framework, all events (notes or pedals) are represented as closed time intervals tied to specific event types. The neural semi-CRF approach requires an interval scoring matrix that assigns a score for every candidate interval. However, designing an efficient and expressive architecture for scoring intervals is not trivial. This paper introduces a simple method for scoring intervals using scaled inner product operations that resemble how attention scoring is done in transformers. We show theoretically that, due to the special structure from encoding the non-overlapping intervals, under a mild condition, the inner product operations are expressive enough to represent an ideal scoring matrix that can yield the correct transcription result. We then…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Diverse Musicological Studies
