Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Chitralekha Gupta; Emre Y{\i}lmaz; Haizhou Li

arXiv:1906.10369·eess.AS·June 26, 2019·5 cites

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

Chitralekha Gupta, Emre Y{\i}lmaz, Haizhou Li

PDF

Open Access

TL;DR

This paper introduces a method for improving automatic lyrics-to-audio alignment by using enhanced features and model adaptation techniques to handle polyphonic music's complexity and domain mismatch.

Contribution

It proposes combining speech and music-informed features with model adaptation on limited polyphonic data to enhance alignment accuracy.

Findings

01

Significant reduction in alignment errors, especially on complex polyphonic tracks.

02

Improved robustness against spectro-temporal variations in singing vocals.

03

Effective adaptation of models trained on solo vocals to polyphonic music.

Abstract

Automatic lyrics to polyphonic audio alignment is a challenging task not only because the vocals are corrupted by background music, but also there is a lack of annotated polyphonic corpus for effective acoustic modeling. In this work, we propose (1) using additional speech and music-informed features and (2) adapting the acoustic models trained on a large amount of solo singing vocals towards polyphonic music using a small amount of in-domain data. Incorporating additional information such as voicing and auditory features together with conventional acoustic features aims to bring robustness against the increased spectro-temporal variations in singing vocals. By adapting the acoustic model using a small amount of polyphonic audio data, we reduce the domain mismatch between training and testing data. We perform several alignment experiments and present an in-depth alignment error analysis…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing