String Transduction with Target Language Models and Insertion Handling

Garrett Nicolai; Saeed Najafi; and Grzegorz Kondrak

arXiv:1809.07182·cs.CL·September 20, 2018

String Transduction with Target Language Models and Insertion Handling

Garrett Nicolai, Saeed Najafi, and Grzegorz Kondrak

PDF

TL;DR

This paper demonstrates that combining target language models from unannotated corpora with precise data alignment significantly improves performance on various character-level sequence transduction tasks.

Contribution

It introduces a novel approach that leverages target language models and alignment techniques to enhance sequence-to-sequence transduction accuracy.

Findings

01

Achieved state-of-the-art results on cognate projection

02

Improved inflection generation performance

03

Enhanced phoneme-to-grapheme conversion accuracy

Abstract

Many character-level tasks can be framed as sequence-to-sequence transduction, where the target is a word from a natural language. We show that leveraging target language models derived from unannotated target corpora, combined with a precise alignment of the training data, yields state-of-the art results on cognate projection, inflection generation, and phoneme-to-grapheme conversion.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.