Learning to Evaluate Translation Beyond English: BLEURT Submissions to   the WMT Metrics 2020 Shared Task

Thibault Sellam; Amy Pu; Hyung Won Chung; Sebastian Gehrmann; Qijun; Tan; Markus Freitag; Dipanjan Das; Ankur P. Parikh

arXiv:2010.04297·cs.CL·October 21, 2020·20 cites

Learning to Evaluate Translation Beyond English: BLEURT Submissions to the WMT Metrics 2020 Shared Task

Thibault Sellam, Amy Pu, Hyung Won Chung, Sebastian Gehrmann, Qijun, Tan, Markus Freitag, Dipanjan Das, Ankur P. Parikh

PDF

Open Access

TL;DR

This paper extends BLEURT, a transfer learning-based evaluation metric, to multiple languages beyond English, demonstrating its effectiveness in the WMT 2020 Shared Task for machine translation quality assessment.

Contribution

The authors adapt BLEURT for multilingual evaluation and combine it with other metrics, improving translation quality assessment across diverse language pairs.

Findings

01

BLEURT performs competitively on WMT 2019 and 2020 tasks.

02

Multilingual extension of BLEURT shows promising results.

03

Combining BLEURT with other metrics enhances evaluation accuracy.

Abstract

The quality of machine translation systems has dramatically improved over the last decade, and as a result, evaluation has become an increasingly challenging problem. This paper describes our contribution to the WMT 2020 Metrics Shared Task, the main benchmark for automatic evaluation of translation. We make several submissions based on BLEURT, a previously published metric based on transfer learning. We extend the metric beyond English and evaluate it on 14 language pairs for which fine-tuning data is available, as well as 4 "zero-shot" language pairs, for which we have no labelled examples. Additionally, we focus on English to German and demonstrate how to combine BLEURT's predictions with those of YiSi and use alternative reference translations to enhance the performance. Empirical results show that the models achieve competitive results on the WMT Metrics 2019 Shared Task,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification