The first open machine translation system for the Chechen language
Abu-Viskhan A. Umishov, Vladislav A. Grigorian

TL;DR
This paper presents the first open-source machine translation system for Chechen, including datasets and evaluation metrics, and explores fine-tuning for multilingual translation models.
Contribution
It introduces the first open-source Chechen translation model, datasets, and evaluation, and demonstrates fine-tuning for integrating Chechen into multilingual translation systems.
Findings
BLEU score of 8.34 for Russian to Chechen translation
ChrF++ score of 34.69 for Russian to Chechen translation
Model and datasets are publicly available
Abstract
We introduce the first open-source model for translation between the vulnerable Chechen language and Russian, and the dataset collected to train and evaluate it. We explore fine-tuning capabilities for including a new language into a large language model system for multilingual translation NLLB-200. The BLEU / ChrF++ scores for our model are 8.34 / 34.69 and 20.89 / 44.55 for translation from Russian to Chechen and reverse direction, respectively. The release of the translation models is accompanied by the distribution of parallel words, phrases and sentences corpora and multilingual sentence encoder adapted to the Chechen language.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
