The first open machine translation system for the Chechen language

Abu-Viskhan A. Umishov; Vladislav A. Grigorian

arXiv:2507.12672·cs.CL·July 18, 2025

The first open machine translation system for the Chechen language

Abu-Viskhan A. Umishov, Vladislav A. Grigorian

PDF

Open Access 1 Models 1 Datasets

TL;DR

This paper presents the first open-source machine translation system for Chechen, including datasets and evaluation metrics, and explores fine-tuning for multilingual translation models.

Contribution

It introduces the first open-source Chechen translation model, datasets, and evaluation, and demonstrates fine-tuning for integrating Chechen into multilingual translation systems.

Findings

01

BLEU score of 8.34 for Russian to Chechen translation

02

ChrF++ score of 34.69 for Russian to Chechen translation

03

Model and datasets are publicly available

Abstract

We introduce the first open-source model for translation between the vulnerable Chechen language and Russian, and the dataset collected to train and evaluate it. We explore fine-tuning capabilities for including a new language into a large language model system for multilingual translation NLLB-200. The BLEU / ChrF++ scores for our model are 8.34 / 34.69 and 20.89 / 44.55 for translation from Russian to Chechen and reverse direction, respectively. The release of the translation models is accompanied by the distribution of parallel words, phrases and sentences corpora and multilingual sentence encoder adapted to the Chechen language.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
NM-development/nllb-ce-rus-v0
model· 93 dl
93 dl

Datasets

NM-development/nmd-ce-ru-171k-v0
dataset· 47 dl
47 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques