Don't Rank, Combine! Combining Machine Translation Hypotheses Using Quality Estimation
Giorgos Vernikos, Andrei Popescu-Belis

TL;DR
This paper introduces QE-fusion, a novel method that combines multiple machine translation hypotheses using a quality estimation metric, leading to improved translation quality across various models and language pairs.
Contribution
The paper presents QE-fusion, a new approach that synthesizes translations by leveraging quality estimation metrics to better align with human preferences, outperforming existing reranking techniques.
Findings
QE-fusion improves translation quality as measured by COMET and BLEURT scores.
The method generates more diverse and novel translations.
QE-fusion scales linearly with the number of candidates in the pool.
Abstract
Neural machine translation systems estimate probabilities of target sentences given source sentences, yet these estimates may not align with human preferences. This work introduces QE-fusion, a method that synthesizes translations using a quality estimation metric (QE), which correlates better with human judgments. QE-fusion leverages a pool of candidates sampled from a model, combining spans from different candidates using a QE metric such as CometKiwi. We compare QE-fusion against beam search and recent reranking techniques, such as Minimum Bayes Risk decoding or QE-reranking. Our method consistently improves translation quality in terms of COMET and BLEURT scores when applied to large language models (LLMs) used for translation (PolyLM, XGLM, Llama2, Mistral, ALMA, and Tower) and to multilingual translation models (NLLB), over five language pairs. Notably, QE-fusion exhibits larger…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Explainable Artificial Intelligence (XAI)
MethodsALIGN
