Pitfalls and Outlooks in Using COMET

Vil\'em Zouhar; Pinzhen Chen; Tsz Kin Lam; Nikita Moghe; Barry Haddow

arXiv:2408.15366·cs.CL·October 1, 2024

Pitfalls and Outlooks in Using COMET

Vil\'em Zouhar, Pinzhen Chen, Tsz Kin Lam, Nikita Moghe, Barry Haddow

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper examines the pitfalls of using the COMET metric in machine translation, highlighting technical, data-related, and reporting issues, and proposes solutions to improve its reliability and comparability.

Contribution

It identifies key pitfalls in COMET's usage and reporting, and introduces sacreCOMET to standardize configurations and enhance metric reliability.

Findings

01

COMET scores vary due to software and hardware issues.

02

Data issues like language mismatch affect COMET reliability.

03

Standardized reporting can improve comparability across studies.

Abstract

The COMET metric has blazed a trail in the machine translation community, given its strong correlation with human judgements of translation quality. Its success stems from being a modified pre-trained multilingual model finetuned for quality assessment. However, it being a machine learning model also gives rise to a new set of pitfalls that may not be widely known. We investigate these unexpected behaviours from three aspects: 1) technical: obsolete software versions and compute precision; 2) data: empty content, language mismatch, and translationese at test time as well as distribution and domain biases in training; 3) usage and reporting: multi-reference support and model referencing in the literature. All of these problems imply that COMET scores are not comparable between papers or even technical setups and we put forward our perspective on fixing each issue. Furthermore, we release…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PinzhenChen/sacreCOMET
noneOfficial

Videos

Pitfalls and Outlooks in Using COMET· underline

Taxonomy

TopicsDistributed and Parallel Computing Systems · Embedded Systems Design Techniques

MethodsSparse Evolutionary Training