One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme   Conversion With a Transformer Ensemble

Kaili Vesik (1); Muhammad Abdul-Mageed (1); Miikka Silfverberg (1); ((1) The University of British Columbia)

arXiv:2006.13343·cs.CL·June 25, 2020

One Model to Pronounce Them All: Multilingual Grapheme-to-Phoneme Conversion With a Transformer Ensemble

Kaili Vesik (1), Muhammad Abdul-Mageed (1), Miikka Silfverberg (1), ((1) The University of British Columbia)

PDF

TL;DR

This paper presents a multilingual Transformer ensemble approach with self-training that significantly improves grapheme-to-phoneme conversion accuracy across 15 languages, especially in low-resource scenarios.

Contribution

It introduces a simple yet effective multilingual Transformer ensemble method combined with self-training for G2P conversion, outperforming previous baselines.

Findings

01

Achieved 14.99 WER and 3.30 PER on G2P task

02

Effective in low-resource language scenarios

03

Outperformed shared task baselines

Abstract

The task of grapheme-to-phoneme (G2P) conversion is important for both speech recognition and synthesis. Similar to other speech and language processing tasks, in a scenario where only small-sized training data are available, learning G2P models is challenging. We describe a simple approach of exploiting model ensembles, based on multilingual Transformers and self-training, to develop a highly effective G2P solution for 15 languages. Our models are developed as part of our participation in the SIGMORPHON 2020 Shared Task 1 focused at G2P. Our best models achieve 14.99 word error rate (WER) and 3.30 phoneme error rate (PER), a sizeable improvement over the shared task competitive baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.