The first neural machine translation system for the Erzya language
David Dale

TL;DR
This paper introduces the first neural machine translation system for Erzya, an endangered language, including datasets, models, and tools, achieving promising translation quality for Erzya-Russian and initial multilingual adaptation.
Contribution
It presents the first neural translation system for Erzya, along with datasets, models, and language tools, enabling future research and development for this low-resource language.
Findings
BLEU scores of 17 and 19 for Erzya-Russian translation
Over half of translations rated acceptable by native speakers
Multilingual adaptation shows low quality without additional data
Abstract
We present the first neural machine translation system for translation between the endangered Erzya language and Russian and the dataset collected by us to train and evaluate it. The BLEU scores are 17 and 19 for translation to Erzya and Russian respectively, and more than half of the translations are rated as acceptable by native speakers. We also adapt our model to translate between Erzya and 10 other languages, but without additional parallel data, the quality on these directions remains low. We release the translation models along with the collected text corpus, a new language identification model, and a multilingual sentence encoder adapted for the Erzya language. These resources will be available at https://github.com/slone-nlp/myv-nmt.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Translation Studies and Practices · Language and cultural evolution
