Difficulty-Aware Machine Translation Evaluation

Runzhe Zhan; Xuebo Liu; Derek F. Wong; Lidia S. Chao

arXiv:2107.14402·cs.CL·August 2, 2021

Difficulty-Aware Machine Translation Evaluation

Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

PDF

Open Access 1 Repo

TL;DR

This paper introduces a difficulty-aware evaluation metric for machine translation that weights translations based on their predicted difficulty, improving correlation with human judgment especially among competitive systems.

Contribution

It proposes a novel MT evaluation metric that incorporates translation difficulty, addressing limitations of existing metrics that treat all sentences equally.

Findings

01

Outperforms standard metrics in human correlation on WMT19 dataset.

02

Effectively distinguishes between highly competitive MT systems.

03

Maintains robustness even when systems are very similar.

Abstract

The high-quality translation results produced by machine translation (MT) systems still pose a huge challenge for automatic evaluation. Current MT evaluation pays the same attention to each sentence component, while the questions of real-world examinations (e.g., university examinations) have different difficulties and weightings. In this paper, we propose a novel difficulty-aware MT evaluation metric, expanding the evaluation dimension by taking translation difficulty into consideration. A translation that fails to be predicted by most MT systems will be treated as a difficult one and assigned a large weight in the final score function, and conversely. Experimental results on the WMT19 English-German Metrics shared tasks show that our proposed method outperforms commonly used MT metrics in terms of human correlation. In particular, our proposed method performs well even when all the MT…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

NLP2CT/Difficulty-Aware-MT-Evaluation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications