Note-Level Singing Melody Transcription for Time-Aligned Musical Score Generation
Leekyung Kim, Sungwook Jeon, Wan Heo, Jonghun Park

TL;DR
This paper presents an end-to-end framework for note-level singing melody transcription that recognizes pitch, onset, offset, and note value, enabling accurate generation of time-aligned musical scores from audio recordings.
Contribution
It introduces a novel integrated model with tokenized representations and pseudo-labeling for note value extraction, improving transcription accuracy over existing methods.
Findings
Outperforms state-of-the-art in note-level transcription accuracy.
Introduces new metrics for evaluating temporal and note value accuracy.
Qualitative analysis confirms effective note value capture.
Abstract
Automatic music transcription converts audio recordings into symbolic representations, facilitating music analysis, retrieval, and generation. A musical note is characterized by pitch, onset, and offset in an audio domain, whereas it is defined in terms of pitch and note value in a musical score domain. A time-aligned score, derived from timing information along with pitch and note value, allows matching a part of the score with the corresponding part of the music audio, enabling various applications. In this paper, we consider an extended version of the traditional note-level transcription task that recognizes onset, offset, and pitch, through including extraction of additional note value to generate a time-aligned score from an audio input. To address this new challenge, we propose an end-to-end framework that integrates recognition of the note value, pitch, and temporal information.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies · Human Motion and Animation
