A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion
Gosse Bouma

TL;DR
This paper introduces a finite-state and data-driven approach for converting written Dutch words into phonemes, achieving high accuracy through rule-based and machine learning enhancements.
Contribution
It presents a novel finite-state method combined with transformation-based learning to improve grapheme-to-phoneme conversion accuracy.
Findings
Over 93% phoneme accuracy with hand-crafted rules
Improved to 99% accuracy using transformation-based learning
Effective on a dataset of 40,000 words
Abstract
A finite-state method, based on leftmost longest-match replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further improved by using transformation-based learning. The phoneme accuracy of the best system (using a large set of rule templates and a `lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99% accuracy.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques
