Aligned Music Notation and Lyrics Transcription
Eliseo Fuentes-Mart\'inez, Antonio R\'ios-Vila, Juan C., Martinez-Sevilla, David Rizo, Jorge Calvo-Zaragoza

TL;DR
This paper introduces the first comprehensive framework for transcribing and aligning music notation with lyrics in vocal scores, addressing a complex challenge in digitizing vocal music with new datasets and evaluation metrics.
Contribution
It formalizes the Aligned Music Notation and Lyrics Transcription (AMNLT) challenge and evaluates various approaches, including end-to-end methods and language models, for the first time.
Findings
End-to-end approaches outperform heuristic methods in alignment accuracy.
Language models excel with sufficient training data.
New datasets and metrics facilitate evaluation of transcription and alignment.
Abstract
The digitization of vocal music scores presents unique challenges that go beyond traditional Optical Music Recognition (OMR) and Optical Character Recognition (OCR), as it necessitates preserving the critical alignment between music notation and lyrics. This alignment is essential for proper interpretation and processing in practical applications. This paper introduces and formalizes, for the first time, the Aligned Music Notation and Lyrics Transcription (AMNLT) challenge, which addresses the complete transcription of vocal scores by jointly considering music symbols, lyrics, and their synchronization. We analyze different approaches to address this challenge, ranging from traditional divide-and-conquer methods that handle music and lyrics separately, to novel end-to-end solutions including direct transcription, unfolding mechanisms, and language modeling. To evaluate these methods, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Music Technology and Sound Studies
