On the Evaluation Practices in Multilingual NLP: Can Machine Translation   Offer an Alternative to Human Translations?

Rochelle Choenni; Sara Rajaee; Christof Monz; Ekaterina Shutova

arXiv:2406.14267·cs.CL·June 21, 2024

On the Evaluation Practices in Multilingual NLP: Can Machine Translation Offer an Alternative to Human Translations?

Rochelle Choenni, Sara Rajaee, Christof Monz, Ekaterina Shutova

PDF

Open Access

TL;DR

This paper critically examines evaluation practices in multilingual NLP, highlighting limitations and proposing machine translation as a scalable alternative for assessing models across many languages, especially low-resource ones.

Contribution

It analyzes current evaluation frameworks, discusses their shortcomings, and empirically demonstrates the potential and limitations of using machine translation for large-scale multilingual model assessment.

Findings

01

High-resource language subsets are representative of broader language groups.

02

Evaluation often overestimates MLM performance on low-resource languages.

03

Simple baselines can perform well without extensive multilingual pretraining.

Abstract

While multilingual language models (MLMs) have been trained on 100+ languages, they are typically only evaluated across a handful of them due to a lack of available test data in most languages. This is particularly problematic when assessing MLM's potential for low-resource and unseen languages. In this paper, we present an analysis of existing evaluation frameworks in multilingual NLP, discuss their limitations, and propose several directions for more robust and reliable evaluation practices. Furthermore, we empirically study to what extent machine translation offers a {reliable alternative to human translation} for large-scale evaluation of MLMs across a wide set of languages. We use a SOTA translation model to translate test data from 4 tasks to 198 languages and use them to evaluate three MLMs. We show that while the selected subsets of high-resource test languages are generally…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training