TL;DR
This paper introduces a novel computational approach to analyze pronunciation variations in sung speech and proposes a singing-adapted pronunciation model that improves automatic lyrics transcription accuracy.
Contribution
It presents a new pronunciation model tailored for singing and provides a benchmark dataset for ALT evaluation, addressing a gap in current research.
Findings
The singing-adapted model outperforms standard speech dictionaries in word recognition tasks.
It achieves the best results on ALT in a capella recordings.
Provides a new benchmark evaluation set for ALT.
Abstract
Recent automatic lyrics transcription (ALT) approaches focus on building stronger acoustic models or in-domain language models, while the pronunciation aspect is seldom touched upon. This paper applies a novel computational analysis on the pronunciation variances in sung utterances and further proposes a new pronunciation model adapted for singing. The singing-adapted model is tested on multiple public datasets via word recognition experiments. It performs better than the standard speech dictionary in all settings reporting the best results on ALT in a capella recordings using n-gram language models. For reproducibility, we share the sentence-level annotations used in testing, providing a new benchmark evaluation set for ALT.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
