One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble
Kaili Vesik (1), Muhammad Abdul-Mageed (1), Miikka Silfverberg (1), ((1) The University of British Columbia)

TL;DR
This paper presents a multilingual Transformer ensemble approach with self-training that significantly improves grapheme-to-phoneme conversion accuracy across 15 languages, especially in low-resource scenarios.
Contribution
It introduces a simple yet effective multilingual Transformer ensemble method combined with self-training for G2P conversion, outperforming previous baselines.
Findings
Achieved 14.99 WER and 3.30 PER on G2P task
Effective in low-resource language scenarios
Outperformed shared task baselines
Abstract
The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
