COMET-poly: Machine Translation Metric Grounded in Other Candidates

Maike Z\"ufle; Vil\'em Zouhar; Tu Anh Dinh; Felipe Maia Polo; Jan Niehues; Mrinmaya Sachan

arXiv:2508.18549·cs.CL·August 27, 2025

COMET-poly: Machine Translation Metric Grounded in Other Candidates

Maike Z\"ufle, Vil\'em Zouhar, Tu Anh Dinh, Felipe Maia Polo, Jan Niehues, Mrinmaya Sachan

PDF

1 Video

TL;DR

COMET-poly introduces two new machine translation evaluation metrics that incorporate multiple candidate translations or similar source texts with human scores, leading to more human-like and accurate quality assessments.

Contribution

The paper presents two novel metrics, COMET-polycand and COMET-polyic, that leverage additional translation candidates or retrieved examples to improve automatic translation quality evaluation.

Findings

01

Including additional translations improves correlation with human judgment.

02

More retrieved examples further enhance metric performance.

03

Models are publicly available for use and benchmarking.

Abstract

Automated metrics for machine translation attempt to replicate human judgment. Unlike humans, who often assess a translation in the context of multiple alternatives, these metrics typically consider only the source sentence and a single translation. This discrepancy in the evaluation setup may negatively impact the performance of automated metrics. We propose two automated metrics that incorporate additional information beyond the single translation. COMET-polycand uses alternative translations of the same source sentence to compare and contrast with the translation at hand, thereby providing a more informed assessment of its quality. COMET-polyic, inspired by retrieval-based in-context learning, takes in translations of similar source texts along with their human-labeled quality scores to guide the evaluation. We find that including a single additional translation in COMET-polycand…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

COMET-poly: Machine Translation Metric Grounded in Other Candidates· underline