Computational Pronunciation Analysis in Sung Utterances

Emir Demirel; Sven Ahlback; Simon Dixon

arXiv:2106.10977·cs.IR·June 22, 2021

Computational Pronunciation Analysis in Sung Utterances

Emir Demirel, Sven Ahlback, Simon Dixon

PDF

1 Repo

TL;DR

This paper introduces a novel computational approach to analyze pronunciation variations in sung speech and proposes a singing-adapted pronunciation model that improves automatic lyrics transcription accuracy.

Contribution

It presents a new pronunciation model tailored for singing and provides a benchmark dataset for ALT evaluation, addressing a gap in current research.

Findings

01

The singing-adapted model outperforms standard speech dictionaries in word recognition tasks.

02

It achieves the best results on ALT in a capella recordings.

03

Provides a new benchmark evaluation set for ALT.

Abstract

Recent automatic lyrics transcription (ALT) approaches focus on building stronger acoustic models or in-domain language models, while the pronunciation aspect is seldom touched upon. This paper applies a novel computational analysis on the pronunciation variances in sung utterances and further proposes a new pronunciation model adapted for singing. The singing-adapted model is tested on multiple public datasets via word recognition experiments. It performs better than the standard speech dictionary in all settings reporting the best results on ALT in a capella recordings using n-gram language models. For reproducibility, we share the sentence-level annotations used in testing, providing a new benchmark evaluation set for ALT.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

emirdemirel/ALTA
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.