Improving grapheme-to-phoneme conversion by learning pronunciations from   speech recordings

Manuel Sam Ribeiro; Giulia Comini; Jaime Lorenzo-Trueba

arXiv:2307.16643·eess.AS·August 1, 2023

Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings

Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba

PDF

Open Access

TL;DR

This paper introduces a novel method to enhance grapheme-to-phoneme conversion by leveraging speech recordings to learn pronunciations, reducing reliance on manual dictionaries and improving accuracy across languages.

Contribution

The paper presents a semi-supervised approach that learns pronunciations from speech data to improve G2P conversion, a significant advancement over traditional dictionary-based methods.

Findings

01

Consistent reduction in phone error rate across multiple languages.

02

Effective bootstrapping of G2P models with minimal annotated data.

03

Improved pronunciation accuracy from speech recordings.

Abstract

The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings. Our approach bootstraps a G2P with a small set of annotated examples. The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation. Given hypothesized phoneme labels, we learn pronunciation dictionaries for out-of-vocabulary words, and we use those to re-train the G2P system. Results indicate that our approach consistently…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling