Mismatching-Aware Unsupervised Translation Quality Estimation For Low-Resource Languages
Fatemeh Azadi, Heshaam Faili, Mohammad Javad Dousti

TL;DR
This paper introduces XLMRScore, an unsupervised translation quality estimation metric tailored for low-resource languages, addressing issues of untranslated tokens and mismatching errors, and demonstrating competitive results against supervised methods.
Contribution
Proposes XLMRScore, a novel unsupervised QE metric that improves translation quality estimation for low-resource languages by handling untranslated tokens and cross-lingual mismatches.
Findings
Achieves comparable results to supervised methods in zero-shot scenarios.
Outperforms existing unsupervised QE methods on low-resource language pairs.
Demonstrates effectiveness on WMT21 datasets and a new English-Persian dataset.
Abstract
Translation Quality Estimation (QE) is the task of predicting the quality of machine translation (MT) output without any reference. This task has gained increasing attention as an important component in the practical applications of MT. In this paper, we first propose XLMRScore, which is a cross-lingual counterpart of BERTScore computed via the XLM-RoBERTa (XLMR) model. This metric can be used as a simple unsupervised QE method, nevertheless facing two issues: firstly, the untranslated tokens leading to unexpectedly high translation scores, and secondly, the issue of mismatching errors between source and hypothesis tokens when applying the greedy matching in XLMRScore. To mitigate these issues, we suggest replacing untranslated words with the unknown token and the cross-lingual alignment of the pre-trained model to represent aligned words closer to each other, respectively. We evaluate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
MethodsTest
