Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
Eunseop Yoon, Hee Suk Yoon, Dhananjaya Gowda, SooHwan Eom, Daehyeok, Kim, John Harvill, Heting Gao, Mark Hasegawa-Johnson, Chanwoo Kim, Chang D., Yoo

TL;DR
This paper addresses the challenge of exposure bias in sentence-level G2P transduction using ByT5, proposing a loss-based sampling method to enhance performance in real-world applications.
Contribution
It introduces a novel loss-based sampling technique to mitigate exposure bias in sentence-level G2P transduction with ByT5.
Findings
Improved G2P performance with the proposed method.
Effective mitigation of exposure bias in auto-regressive models.
Enhanced usability for real-world G2P applications.
Abstract
Text-to-Text Transfer Transformer (T5) has recently been considered for the Grapheme-to-Phoneme (G2P) transduction. As a follow-up, a tokenizer-free byte-level model based on T5 referred to as ByT5, recently gave promising results on word-level G2P conversion by representing each input character with its corresponding UTF-8 encoding. Although it is generally understood that sentence-level or paragraph-level G2P can improve usability in real-world applications as it is better suited to perform on heteronyms and linking sounds between words, we find that using ByT5 for these scenarios is nontrivial. Since ByT5 operates on the character level, it requires longer decoding steps, which deteriorates the performance due to the exposure bias commonly observed in auto-regressive generation models. This paper shows that the performance of sentence-level and paragraph-level G2P can be improved by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Topic Modeling
MethodsGated Linear Unit · Multi-Head Attention · Attention Is All You Need · Linear Layer · SentencePiece · Inverse Square Root Schedule · Position-Wise Feed-Forward Layer · Layer Normalization · Softmax · Attention Dropout
