Acoustic Modeling for Automatic Lyrics-to-Audio Alignment
Chitralekha Gupta, Emre Y{\i}lmaz, Haizhou Li

TL;DR
This paper introduces a method for improving automatic lyrics-to-audio alignment by using enhanced features and model adaptation techniques to handle polyphonic music's complexity and domain mismatch.
Contribution
It proposes combining speech and music-informed features with model adaptation on limited polyphonic data to enhance alignment accuracy.
Findings
Significant reduction in alignment errors, especially on complex polyphonic tracks.
Improved robustness against spectro-temporal variations in singing vocals.
Effective adaptation of models trained on solo vocals to polyphonic music.
Abstract
Automatic lyrics to polyphonic audio alignment is a challenging task not only because the vocals are corrupted by background music, but also there is a lack of annotated polyphonic corpus for effective acoustic modeling. In this work, we propose (1) using additional speech and music-informed features and (2) adapting the acoustic models trained on a large amount of solo singing vocals towards polyphonic music using a small amount of in-domain data. Incorporating additional information such as voicing and auditory features together with conventional acoustic features aims to bring robustness against the increased spectro-temporal variations in singing vocals. By adapting the acoustic model using a small amount of polyphonic audio data, we reduce the domain mismatch between training and testing data. We perform several alignment experiments and present an in-depth alignment error analysis…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic and Audio Processing · Speech Recognition and Synthesis · Speech and Audio Processing
