String Transduction with Target Language Models and Insertion Handling
Garrett Nicolai, Saeed Najafi, and Grzegorz Kondrak

TL;DR
This paper demonstrates that combining target language models from unannotated corpora with precise data alignment significantly improves performance on various character-level sequence transduction tasks.
Contribution
It introduces a novel approach that leverages target language models and alignment techniques to enhance sequence-to-sequence transduction accuracy.
Findings
Achieved state-of-the-art results on cognate projection
Improved inflection generation performance
Enhanced phoneme-to-grapheme conversion accuracy
Abstract
Many character-level tasks can be framed as sequence-to-sequence transduction, where the target is a word from a natural language. We show that leveraging target language models derived from unannotated target corpora, combined with a precise alignment of the training data, yields state-of-the art results on cognate projection, inflection generation, and phoneme-to-grapheme conversion.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
