COMET: A Neural Framework for MT Evaluation

Ricardo Rei; Craig Stewart; Ana C Farinha; Alon Lavie

arXiv:2009.09025·cs.CL·October 20, 2020

COMET: A Neural Framework for MT Evaluation

Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie

PDF

1 Repo 1 Datasets

TL;DR

COMET is a neural framework that leverages cross-lingual pretrained models to evaluate machine translation quality more accurately, achieving state-of-the-art correlation with human judgments across multiple datasets.

Contribution

The paper introduces COMET, a novel neural evaluation framework that utilizes multilingual pretrained models and source-reference information for improved MT quality assessment.

Findings

01

Achieves new state-of-the-art correlation with human judgments.

02

Demonstrates robustness across different types of human evaluation data.

03

Performs well on the WMT 2019 Metrics shared task.

Abstract

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Unbabel/COMET
pytorchOfficial

Datasets

mario-rc/dstc11.t4
dataset· 119 dl
119 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.