Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics
Sungkyun Chang, Kyogu Lee

TL;DR
This paper presents an unsupervised method for lyrics-to-audio alignment that leverages repetitive vowel patterns in singing voices, using WS-NMF and CTW to improve alignment accuracy without relying on pre-trained speech recognition models.
Contribution
The authors introduce a novel unsupervised approach combining WS-NMF and CTW to align lyrics with singing audio, overcoming limitations of traditional ASR-based methods.
Findings
Outperforms state-of-the-art unsupervised methods in experiments
Achieves better alignment accuracy after singing source separation
Effective on both Korean and English datasets
Abstract
Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers. A significant aspect missing in previous works is the self-learnability of repetitive vowel patterns in the singing voice, where the vowel part used is more consistent than the consonant part. Based on this, our system first learns a discriminative subspace of vowel sequences, based on weighted symmetric non-negative matrix factorization (WS-NMF), by taking the self-similarity of a standard acoustic feature as an input. Then, we make use of canonical time warping (CTW), derived from a recent computer vision technique, to find an optimal spatiotemporal transformation between the text and the acoustic sequences. Experiments with Korean and English data sets showed that…
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9
Figure 10
Figure 11Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
