BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust   Machine Translation Evaluation

Taisiya Glushkova; Chrysoula Zerva; Andr\'e F. T. Martins

arXiv:2305.19144·cs.CL·May 31, 2023·2 cites

BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation

Taisiya Glushkova, Chrysoula Zerva, Andr\'e F. T. Martins

PDF

Open Access 1 Repo

TL;DR

This paper proposes combining neural and lexical evaluation metrics for machine translation to improve robustness against critical errors like entity and number deviations, enhancing correlation with human judgments.

Contribution

It introduces methods to integrate lexical and neural metrics using additional training features, improving detection of critical translation errors.

Findings

01

Combined metrics better detect entity and number errors.

02

Enhanced metrics show higher correlation with human judgments.

03

Approach improves robustness across multiple language pairs.

Abstract

Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such as deviations in entities and numbers. In contrast, traditional evaluation metrics, such as BLEU or chrF, which measure lexical or character overlap between translation hypotheses and human references, have lower correlations with human judgements but are sensitive to such deviations. In this paper, we investigate several ways of combining the two approaches in order to increase robustness of state-of-the-art evaluation methods to translations with critical errors. We show that by using additional information during training, such as sentence-level features and word-level tags, the trained metrics improve their capability to penalize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

deep-spin/robust_mt_evaluation
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)