Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive   Patterns in Vowel Acoustics

Sungkyun Chang; Kyogu Lee

arXiv:1701.06078·cs.SD·October 29, 2020

Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive Patterns in Vowel Acoustics

Sungkyun Chang, Kyogu Lee

PDF

TL;DR

This paper presents an unsupervised method for lyrics-to-audio alignment that leverages repetitive vowel patterns in singing voices, using WS-NMF and CTW to improve alignment accuracy without relying on pre-trained speech recognition models.

Contribution

The authors introduce a novel unsupervised approach combining WS-NMF and CTW to align lyrics with singing audio, overcoming limitations of traditional ASR-based methods.

Findings

01

Outperforms state-of-the-art unsupervised methods in experiments

02

Achieves better alignment accuracy after singing source separation

03

Effective on both Korean and English datasets

Abstract

Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers. A significant aspect missing in previous works is the self-learnability of repetitive vowel patterns in the singing voice, where the vowel part used is more consistent than the consonant part. Based on this, our system first learns a discriminative subspace of vowel sequences, based on weighted symmetric non-negative matrix factorization (WS-NMF), by taking the self-similarity of a standard acoustic feature as an input. Then, we make use of canonical time warping (CTW), derived from a recent computer vision technique, to find an optimal spatiotemporal transformation between the text and the acoustic sequences. Experiments with Korean and English data sets showed that…

Figures11

Click any figure to enlarge with its caption.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.