Phonology-Augmented Statistical Framework for Machine Transliteration   using Limited Linguistic Resources

Gia H. Ngo; Minh Nguyen; Nancy F. Chen

arXiv:1810.03184·cs.CL·February 21, 2019

Phonology-Augmented Statistical Framework for Machine Transliteration using Limited Linguistic Resources

Gia H. Ngo, Minh Nguyen, Nancy F. Chen

PDF

TL;DR

This paper introduces a phonology-augmented statistical framework for machine transliteration that effectively incorporates target language phonological rules, especially useful when limited linguistic resources are available, improving transliteration accuracy.

Contribution

It proposes the concept of pseudo-syllables to model target language phonology within a statistical transliteration framework, enhancing performance with scarce data.

Findings

01

Outperforms baseline by up to 44.68% with limited data

02

Effective for Vietnamese and Cantonese transliteration

03

Utilizes pseudo-syllables to encode phonological constraints

Abstract

Transliteration converts words in a source language (e.g., English) into words in a target language (e.g., Vietnamese). This conversion considers the phonological structure of the target language, as the transliterated output needs to be pronounceable in the target language. For example, a word in Vietnamese that begins with a consonant cluster is phonologically invalid and thus would be an incorrect output of a transliteration system. Most statistical transliteration approaches, albeit being widely adopted, do not explicitly model the target language's phonology, which often results in invalid outputs. The problem is compounded by the limited linguistic resources available when converting foreign words to transliterated words in the target language. In this work, we present a phonology-augmented statistical framework suitable for transliteration, especially when only limited linguistic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.