Improving grapheme-to-phoneme conversion by learning pronunciations from speech recordings
Manuel Sam Ribeiro, Giulia Comini, Jaime Lorenzo-Trueba

TL;DR
This paper introduces a novel method to enhance grapheme-to-phoneme conversion by leveraging speech recordings to learn pronunciations, reducing reliance on manual dictionaries and improving accuracy across languages.
Contribution
The paper presents a semi-supervised approach that learns pronunciations from speech data to improve G2P conversion, a significant advancement over traditional dictionary-based methods.
Findings
Consistent reduction in phone error rate across multiple languages.
Effective bootstrapping of G2P models with minimal annotated data.
Improved pronunciation accuracy from speech recordings.
Abstract
The Grapheme-to-Phoneme (G2P) task aims to convert orthographic input into a discrete phonetic representation. G2P conversion is beneficial to various speech processing applications, such as text-to-speech and speech recognition. However, these tend to rely on manually-annotated pronunciation dictionaries, which are often time-consuming and costly to acquire. In this paper, we propose a method to improve the G2P conversion task by learning pronunciation examples from audio recordings. Our approach bootstraps a G2P with a small set of annotated examples. The G2P model is used to train a multilingual phone recognition system, which then decodes speech recordings with a phonetic representation. Given hypothesized phoneme labels, we learn pronunciation dictionaries for out-of-vocabulary words, and we use those to re-train the G2P system. Results indicate that our approach consistently…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
