COMET: A Neural Framework for MT Evaluation
Ricardo Rei, Craig Stewart, Ana C Farinha, Alon Lavie

TL;DR
COMET is a neural framework that leverages cross-lingual pretrained models to evaluate machine translation quality more accurately, achieving state-of-the-art correlation with human judgments across multiple datasets.
Contribution
The paper introduces COMET, a novel neural evaluation framework that utilizes multilingual pretrained models and source-reference information for improved MT quality assessment.
Findings
Achieves new state-of-the-art correlation with human judgments.
Demonstrates robustness across different types of human evaluation data.
Performs well on the WMT 2019 Metrics shared task.
Abstract
We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgements. Our framework leverages recent breakthroughs in cross-lingual pretrained language modeling resulting in highly multilingual and adaptable MT evaluation models that exploit information from both the source input and a target-language reference translation in order to more accurately predict MT quality. To showcase our framework, we train three models with different types of human judgements: Direct Assessments, Human-mediated Translation Edit Rate and Multidimensional Quality Metrics. Our models achieve new state-of-the-art performance on the WMT 2019 Metrics shared task and demonstrate robustness to high-performing systems.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
