Variance-Aware Machine Translation Test Sets
Runzhe Zhan, Xuebo Liu, Derek F. Wong, Lidia S. Chao

TL;DR
This paper introduces variance-aware test sets for machine translation evaluation, automatically created to better correlate with human judgment and highlight challenging linguistic features, aiding future test set construction.
Contribution
It proposes a novel variance-aware filtering method to automatically generate discriminative MT test sets without human labor, improving evaluation reliability.
Findings
VAT correlates better with human judgment than original WMT sets
VAT highlights challenging linguistic features like low-frequency words
The method is applicable across multiple language pairs and test sets
Abstract
We release 70 small and discriminative test sets for machine translation (MT) evaluation called variance-aware test sets (VAT), covering 35 translation directions from WMT16 to WMT20 competitions. VAT is automatically created by a novel variance-aware filtering method that filters the indiscriminative test instances of the current MT test sets without any human labor. Experimental results show that VAT outperforms the original WMT test sets in terms of the correlation with human judgement across mainstream language pairs and test sets. Further analysis on the properties of VAT reveals the challenging linguistic features (e.g., translation of low-frequency words and proper nouns) for competitive MT systems, providing guidance for constructing future MT test sets. The test sets and the code for preparing variance-aware MT test sets are freely available at…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
