How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Anushka Singh; Ananya B. Sai; Raj Dabre; Ratish Puduppully; Anoop; Kunchukuttan; Mitesh M Khapra

arXiv:2406.03893·cs.CL·June 7, 2024

How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?

Anushka Singh, Ananya B. Sai, Raj Dabre, Ratish Puduppully, Anoop, Kunchukuttan, Mitesh M Khapra

PDF

Open Access 1 Repo 1 Datasets 1 Video

TL;DR

This paper assesses the effectiveness of zero-shot machine translation evaluation methods for low-resource Indian languages, revealing significant gaps between automatic metrics and human judgments.

Contribution

It provides a comprehensive evaluation of zero-shot MT metrics on low-resource Indian languages using new annotated test sets and highlights the limitations of current approaches.

Findings

01

Zero-shot metrics have low correlation with human judgments (up to 0.45 Pearson)

02

Synthetic data approaches do not significantly improve evaluation accuracy

03

Evaluation for low-resource languages remains a challenging open problem

Abstract

While machine translation evaluation has been studied primarily for high-resource languages, there has been a recent interest in evaluation for low-resource languages due to the increasing availability of data and models. In this paper, we focus on a zero-shot evaluation setting focusing on low-resource Indian languages, namely Assamese, Kannada, Maithili, and Punjabi. We collect sufficient Multi-Dimensional Quality Metrics (MQM) and Direct Assessment (DA) annotations to create test sets and meta-evaluate a plethora of automatic evaluation metrics. We observe that even for learned metrics, which are known to exhibit zero-shot performance, the Kendall Tau and Pearson correlations with human annotations are only as high as 0.32 and 0.45. Synthetic data approaches show mixed results and overall do not help close the gap by much for these languages. This indicates that there is still a long…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai4bharat/indicmt-eval
pytorchOfficial

Datasets

ai4bharat/IndicMTEval
dataset· 121 dl
121 dl

Videos

How Good is Zero-Shot MT Evaluation for Low Resource Indian Languages?· underline

Taxonomy

TopicsText Readability and Simplification · Natural Language Processing Techniques · Speech Recognition and Synthesis

MethodsFocus