Unbabel's Participation in the WMT20 Metrics Shared Task
Ricardo Rei, Craig Stewart, Catarina Farinha, Alon Lavie

TL;DR
Unbabel's team developed models based on the COMET framework to evaluate machine translation quality across multiple levels and tracks, achieving state-of-the-art results in the WMT20 shared task.
Contribution
The paper introduces new estimator and ranking models within the COMET framework for translation quality assessment, including a technique for segment-to-document score conversion.
Findings
Achieved strong results across all language pairs
Set new state-of-the-art performance in many tracks
Demonstrated effectiveness of COMET-based models
Abstract
We present the contribution of the Unbabel team to the WMT 2020 Shared Task on Metrics. We intend to participate on the segment-level, document-level and system-level tracks on all language pairs, as well as the 'QE as a Metric' track. Accordingly, we illustrate results of our models in these tracks with reference to test sets from the previous year. Our submissions build upon the recently proposed COMET framework: We train several estimator models to regress on different human-generated quality scores and a novel ranking model trained on relative ranks obtained from Direct Assessments. We also propose a simple technique for converting segment-level predictions into a document-level score. Overall, our systems achieve strong results for all language pairs on previous test sets and in many cases set a new state-of-the-art.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Semantic Web and Ontologies
