ByT5 model for massively multilingual grapheme-to-phoneme conversion
Jian Zhu, Cong Zhang, David Jurgens

TL;DR
This paper introduces a multilingual grapheme-to-phoneme conversion model based on ByT5, demonstrating superior performance over token-based models across 100 languages, and providing pretrained models for low-resource and zero-shot scenarios.
Contribution
The study develops and evaluates a byte-level ByT5 model for multilingual G2P, outperforming token-based models and offering pretrained weights for low-resource languages.
Findings
ByT5 outperforms mT5 in multilingual G2P tasks.
Multilingual models reduce phone error rate across languages.
Pretrained models improve low-resource and zero-shot G2P performance.
Abstract
In this study, we tackle massively multilingual grapheme-to-phoneme conversion through implementing G2P models based on ByT5. We have curated a G2P dataset from various sources that covers around 100 languages and trained large-scale multilingual G2P models based on ByT5. We found that ByT5 operating on byte-level inputs significantly outperformed the token-based mT5 model in terms of multilingual G2P. Pairwise comparison with monolingual models in these languages suggests that multilingual ByT5 models generally lower the phone error rate by jointly learning from a variety of languages. The pretrained model can further benefit low resource G2P through zero-shot prediction on unseen languages or provides pretrained weights for finetuning, which helps the model converge to a lower phone error rate than randomly initialized weights. To facilitate future research on multilingual G2P, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Ferroelectric and Negative Capacitance Devices
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Multi-Head Attention · Dropout · SentencePiece · Inverse Square Root Schedule · Adafactor · Gated Linear Unit · Softmax
