Towards Building an End-to-End Multilingual Automatic Lyrics   Transcription Model

Jiawen Huang; Emmanouil Benetos

arXiv:2406.17618·eess.AS·June 26, 2024

Towards Building an End-to-End Multilingual Automatic Lyrics Transcription Model

Jiawen Huang, Emmanouil Benetos

PDF

Open Access 1 Repo

TL;DR

This paper develops a multilingual automatic lyrics transcription system, expanding existing architectures for English to multiple languages, and shows that multilingual models with language conditioning outperform monolingual models.

Contribution

It adapts effective English ALT architectures to a multilingual setting by expanding vocabularies and incorporating language conditioning, demonstrating improved performance over monolingual models.

Findings

01

Multilingual ALT models outperform monolingual counterparts.

02

Incorporating language information significantly improves transcription accuracy.

03

Multilingual models show consistent performance across different languages.

Abstract

Multilingual automatic lyrics transcription (ALT) is a challenging task due to the limited availability of labelled data and the challenges introduced by singing, compared to multilingual automatic speech recognition. Although some multilingual singing datasets have been released recently, English continues to dominate these collections. Multilingual ALT remains underexplored due to the scale of data and annotation quality. In this paper, we aim to create a multilingual ALT system with available datasets. Inspired by architectures that have been proven effective for English ALT, we adapt these techniques to the multilingual scenario by expanding the target vocabulary set. We then evaluate the performance of the multilingual model in comparison to its monolingual counterparts. Additionally, we explore various conditioning methods to incorporate language information into the model. We…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jhuang448/MultilingualALT
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Music and Audio Processing · Topic Modeling