Lyrics Transcription for Humans: A Readability-Aware Benchmark

Ond\v{r}ej C\'ifka; Hendrik Schreiber; Luke Miner; Fabian-Robert; St\"oter

arXiv:2408.06370·eess.AS·August 14, 2024

Lyrics Transcription for Humans: A Readability-Aware Benchmark

Ond\v{r}ej C\'ifka, Hendrik Schreiber, Luke Miner, Fabian-Robert, St\"oter

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper introduces Jam-ALT, a new benchmark for lyrics transcription that emphasizes readability, formatting, and contextual nuances, addressing the gap in existing ALT benchmarks which focus solely on word accuracy.

Contribution

The paper presents Jam-ALT, a revised and standardized lyrics transcription benchmark that incorporates formatting and contextual evaluation metrics for improved readability.

Findings

01

Existing ALT benchmarks focus only on word accuracy.

02

Jam-ALT provides a comprehensive, industry-standard dataset and evaluation metrics.

03

Experimental results highlight the importance of formatting and contextual understanding.

Abstract

Writing down lyrics for human consumption involves not only accurately capturing word sequences, but also incorporating punctuation and formatting for clarity and to convey contextual information. This includes song structure, emotional emphasis, and contrast between lead and background vocals. While automatic lyrics transcription (ALT) systems have advanced beyond producing unstructured strings of words and are able to draw on wider context, ALT benchmarks have not kept pace and continue to focus exclusively on words. To address this gap, we introduce Jam-ALT, a comprehensive lyrics transcription benchmark. The benchmark features a complete revision of the JamendoLyrics dataset, in adherence to industry standards for lyrics transcription and formatting, along with evaluation metrics designed to capture and assess the lyric-specific nuances, laying the foundation for improving the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

audioshake/alt-eval
noneOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Music and Audio Processing