IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for   Indian Languages

Ananya B. Sai; Vignesh Nagarajan; Tanay Dixit; Raj Dabre; Anoop; Kunchukuttan; Pratyush Kumar; Mitesh M. Khapra

arXiv:2212.10180·cs.CL·July 4, 2023·1 cites

IndicMT Eval: A Dataset to Meta-Evaluate Machine Translation metrics for Indian Languages

Ananya B. Sai, Vignesh Nagarajan, Tanay Dixit, Raj Dabre, Anoop, Kunchukuttan, Pratyush Kumar, Mitesh M. Khapra

PDF

Open Access 1 Repo

TL;DR

This paper introduces IndicMT Eval, a dataset for evaluating machine translation metrics for Indian languages, revealing that current metrics like COMET correlate well with human judgments but often miss fluency errors.

Contribution

The paper creates a new MQM dataset with 7000 annotations for Indian languages and systematically evaluates existing metrics' effectiveness in this context.

Findings

01

Pre-trained metrics like COMET show high correlation with human scores.

02

Current metrics inadequately capture fluency errors in Indian languages.

03

The dataset facilitates future research in MT evaluation for Indian languages.

Abstract

The rapid growth of machine translation (MT) systems has necessitated comprehensive studies to meta-evaluate evaluation metrics being used, which enables a better selection of metrics that best reflect MT quality. Unfortunately, most of the research focuses on high-resource languages, mainly English, the observations for which may not always apply to other languages. Indian languages, having over a billion speakers, are linguistically different from English, and to date, there has not been a systematic study of evaluating MT systems from English into Indian languages. In this paper, we fill this gap by creating an MQM dataset consisting of 7000 fine-grained annotations, spanning 5 Indian languages and 7 MT systems, and use it to establish correlations between annotator scores and scores obtained using existing automatic metrics. Our results show that pre-trained metrics, such as COMET,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai4bharat/indicmt-eval
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Software Engineering Research