Evaluating Extremely Low-Resource Machine Translation: A Comparative Study of ChrF++ and BLEU Metrics

Sanjeev Kumar; Preethi Jyothi; Pushpak Bhattacharyya

arXiv:2602.17425·cs.CL·February 20, 2026

Evaluating Extremely Low-Resource Machine Translation: A Comparative Study of ChrF++ and BLEU Metrics

Sanjeev Kumar, Preethi Jyothi, Pushpak Bhattacharyya

PDF

Open Access

TL;DR

This study compares BLEU and ChrF++ metrics for evaluating machine translation quality in extremely low-resource languages, revealing that both metrics offer complementary insights despite their differences.

Contribution

The paper provides a comparative analysis of BLEU and ChrF++ metrics specifically in extremely low-resource language translation scenarios, highlighting their respective strengths and limitations.

Findings

01

BLEU offers valuable lexical-precision insights in low-resource settings.

02

ChrF++ is effective in capturing character-level translation quality.

03

Using both metrics together improves evaluation interpretability.

Abstract

Evaluating machine translation (MT) quality in extremely low-resource language (ELRL) scenarios poses unique challenges, as widely used metrics such as BLEU, effective in high-resource settings, often misrepresent quality in data-scarce contexts. This work presents a comparative analysis of BLEU, an n-gram-based metric, and ChrF++, a character-based metric, for MT evaluation in ELRL settings. We examine how each metric responds to translation artifacts, including hallucinations, repetition, source-text copying, and diacritic (\textit{matra}) variations across three ELRLs: Magahi, Bhojpuri, and Chhattisgarhi, with a focus on outputs from large language models (LLMs) and neural MT (NMT) systems. While recent work often relies solely on ChrF++, our findings show that BLEU, despite its lower absolute scores, provides complementary lexical-precision insights that improve interpretability.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Language and cultural evolution