A Finite State and Data-Oriented Method for Grapheme to Phoneme   Conversion

Gosse Bouma

arXiv:cs/0003074·cs.CL·May 23, 2007·23 cites

A Finite State and Data-Oriented Method for Grapheme to Phoneme Conversion

Gosse Bouma

PDF

Open Access

TL;DR

This paper introduces a finite-state and data-driven approach for converting written Dutch words into phonemes, achieving high accuracy through rule-based and machine learning enhancements.

Contribution

It presents a novel finite-state method combined with transformation-based learning to improve grapheme-to-phoneme conversion accuracy.

Findings

01

Over 93% phoneme accuracy with hand-crafted rules

02

Improved to 99% accuracy using transformation-based learning

03

Effective on a dataset of 40,000 words

Abstract

A finite-state method, based on leftmost longest-match replacement, is presented for segmenting words into graphemes, and for converting graphemes into phonemes. A small set of hand-crafted conversion rules for Dutch achieves a phoneme accuracy of over 93%. The accuracy of the system is further improved by using transformation-based learning. The phoneme accuracy of the best system (using a large set of rule templates and a `lazy' variant of Brill's algoritm), trained on only 40K words, reaches 99% accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques