On-Line Audio-to-Lyrics Alignment Based on a Reference Performance
Charles Brazier, Gerhard Widmer

TL;DR
This paper introduces a real-time audio-to-lyrics alignment system capable of tracking lyrics across different languages, using phoneme probability predictions and a reference performance, with demonstrated robustness on opera genres.
Contribution
The work presents the first real-time audio-to-lyrics alignment pipeline that is robust across languages and does not require language-specific tuning.
Findings
Achieves accurate real-time lyrics tracking in classical opera
Demonstrates robustness to out-of-training languages like Jingju
Uses phoneme probability vectors and reference performances for alignment
Abstract
Audio-to-lyrics alignment has become an increasingly active research task in MIR, supported by the emergence of several open-source datasets of audio recordings with word-level lyrics annotations. However, there are still a number of open problems, such as a lack of robustness in the face of severe duration mismatches between audio and lyrics representation; a certain degree of language-specificity caused by acoustic differences across languages; and the fact that most successful methods in the field are not suited to work in real-time. Real-time lyrics alignment (tracking) would have many useful applications, such as fully automated subtitle display in live concerts and opera. In this work, we describe the first real-time-capable audio-to-lyrics alignment pipeline that is able to robustly track the lyrics of different languages, without additional language information. The proposed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
