Lyrics Transcription for Humans: A Readability-Aware Benchmark
Ond\v{r}ej C\'ifka, Hendrik Schreiber, Luke Miner, Fabian-Robert, St\"oter

TL;DR
This paper introduces Jam-ALT, a new benchmark for lyrics transcription that emphasizes readability, formatting, and contextual nuances, addressing the gap in existing ALT benchmarks which focus solely on word accuracy.
Contribution
The paper presents Jam-ALT, a revised and standardized lyrics transcription benchmark that incorporates formatting and contextual evaluation metrics for improved readability.
Findings
Existing ALT benchmarks focus only on word accuracy.
Jam-ALT provides a comprehensive, industry-standard dataset and evaluation metrics.
Experimental results highlight the importance of formatting and contextual understanding.
Abstract
Writing down lyrics for human consumption involves not only accurately capturing word sequences, but also incorporating punctuation and formatting for clarity and to convey contextual information. This includes song structure, emotional emphasis, and contrast between lead and background vocals. While automatic lyrics transcription (ALT) systems have advanced beyond producing unstructured strings of words and are able to draw on wider context, ALT benchmarks have not kept pace and continue to focus exclusively on words. To address this gap, we introduce Jam-ALT, a comprehensive lyrics transcription benchmark. The benchmark features a complete revision of the JamendoLyrics dataset, in adherence to industry standards for lyrics transcription and formatting, along with evaluation metrics designed to capture and assess the lyric-specific nuances, laying the foundation for improving the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Music and Audio Processing
