# Lyrics-to-Audio Alignment by Unsupervised Discovery of Repetitive   Patterns in Vowel Acoustics

**Authors:** Sungkyun Chang, Kyogu Lee

arXiv: 1701.06078 · 2020-10-29

## TL;DR

This paper presents an unsupervised method for lyrics-to-audio alignment that leverages repetitive vowel patterns in singing voices, using WS-NMF and CTW to improve alignment accuracy without relying on pre-trained speech recognition models.

## Contribution

The authors introduce a novel unsupervised approach combining WS-NMF and CTW to align lyrics with singing audio, overcoming limitations of traditional ASR-based methods.

## Key findings

- Outperforms state-of-the-art unsupervised methods in experiments
- Achieves better alignment accuracy after singing source separation
- Effective on both Korean and English datasets

## Abstract

Most of the previous approaches to lyrics-to-audio alignment used a pre-developed automatic speech recognition (ASR) system that innately suffered from several difficulties to adapt the speech model to individual singers. A significant aspect missing in previous works is the self-learnability of repetitive vowel patterns in the singing voice, where the vowel part used is more consistent than the consonant part. Based on this, our system first learns a discriminative subspace of vowel sequences, based on weighted symmetric non-negative matrix factorization (WS-NMF), by taking the self-similarity of a standard acoustic feature as an input. Then, we make use of canonical time warping (CTW), derived from a recent computer vision technique, to find an optimal spatiotemporal transformation between the text and the acoustic sequences. Experiments with Korean and English data sets showed that deploying this method after a pre-developed, unsupervised, singing source separation achieved more promising results than other state-of-the-art unsupervised approaches and an existing ASR-based system.

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/1701.06078/full.md

---
Source: https://tomesphere.com/paper/1701.06078