TL;DR
KoBE introduces a reference-free machine translation evaluation method grounded in a multilingual knowledge base, achieving high correlation with human judgments across multiple language pairs and outperforming BLEU on several benchmarks.
Contribution
The paper presents a novel entity grounding-based evaluation method that does not rely on reference translations, outperforming existing metrics on multiple language pairs.
Findings
Achieves highest correlation with human judgments on 9 out of 18 language pairs.
Outperforms BLEU on 4 language pairs.
Provides a large-scale grounded entity mention dataset for research.
Abstract
We propose a simple and effective method for machine translation evaluation which does not require reference translations. Our approach is based on (1) grounding the entity mentions found in each source sentence and candidate translation against a large-scale multilingual knowledge base, and (2) measuring the recall of the grounded entities found in the candidate vs. those found in the source. Our approach achieves the highest correlation with human judgements on 9 out of the 18 language pairs from the WMT19 benchmark for evaluation without references, which is the largest number of wins for a single evaluation method on this task. On 4 language pairs, we also achieve higher correlation with human judgements than BLEU. To foster further research, we release a dataset containing 1.8 million grounded entity mentions across 18 language pairs from the WMT19 metrics track data.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
