On-Line Audio-to-Lyrics Alignment Based on a Reference Performance

Charles Brazier; Gerhard Widmer

arXiv:2107.14496·eess.AS·August 2, 2021·ISMIR·1 cites

On-Line Audio-to-Lyrics Alignment Based on a Reference Performance

Charles Brazier, Gerhard Widmer

PDF

Open Access

TL;DR

This paper introduces a real-time audio-to-lyrics alignment system capable of tracking lyrics across different languages, using phoneme probability predictions and a reference performance, with demonstrated robustness on opera genres.

Contribution

The work presents the first real-time audio-to-lyrics alignment pipeline that is robust across languages and does not require language-specific tuning.

Findings

01

Achieves accurate real-time lyrics tracking in classical opera

02

Demonstrates robustness to out-of-training languages like Jingju

03

Uses phoneme probability vectors and reference performances for alignment

Abstract

Audio-to-lyrics alignment has become an increasingly active research task in MIR, supported by the emergence of several open-source datasets of audio recordings with word-level lyrics annotations. However, there are still a number of open problems, such as a lack of robustness in the face of severe duration mismatches between audio and lyrics representation; a certain degree of language-specificity caused by acoustic differences across languages; and the fact that most successful methods in the field are not suited to work in real-time. Real-time lyrics alignment (tracking) would have many useful applications, such as fully automated subtitle display in live concerts and opera. In this work, we describe the first real-time-capable audio-to-lyrics alignment pipeline that is able to robustly track the lyrics of different languages, without additional language information. The proposed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing