BLEU Meets COMET: Combining Lexical and Neural Metrics Towards Robust Machine Translation Evaluation
Taisiya Glushkova, Chrysoula Zerva, Andr\'e F. T. Martins

TL;DR
This paper proposes combining neural and lexical evaluation metrics for machine translation to improve robustness against critical errors like entity and number deviations, enhancing correlation with human judgments.
Contribution
It introduces methods to integrate lexical and neural metrics using additional training features, improving detection of critical translation errors.
Findings
Combined metrics better detect entity and number errors.
Enhanced metrics show higher correlation with human judgments.
Approach improves robustness across multiple language pairs.
Abstract
Although neural-based machine translation evaluation metrics, such as COMET or BLEURT, have achieved strong correlations with human judgements, they are sometimes unreliable in detecting certain phenomena that can be considered as critical errors, such as deviations in entities and numbers. In contrast, traditional evaluation metrics, such as BLEU or chrF, which measure lexical or character overlap between translation hypotheses and human references, have lower correlations with human judgements but are sensitive to such deviations. In this paper, we investigate several ways of combining the two approaches in order to increase robustness of state-of-the-art evaluation methods to translations with critical errors. We show that by using additional information during training, such as sentence-level features and word-level tags, the trained metrics improve their capability to penalize…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)
