TL;DR
Verdi is a novel framework that leverages dual learning and transformer-based models to improve quality estimation and error detection in bilingual corpora, enhancing translation quality and corpus cleaning.
Contribution
It introduces a dual learning scheme for NMT predictors and a new feature encoding translated target information, outperforming existing methods in quality estimation tasks.
Findings
Verdi outperforms the winner of the WMT20 QE competition.
Using Verdi improves parallel corpus cleaning and training efficiency.
The dual learning approach enhances context prediction in bilingual quality estimation.
Abstract
Translation Quality Estimation is critical to reducing post-editing efforts in machine translation and to cross-lingual corpus cleaning. As a research problem, quality estimation (QE) aims to directly estimate the quality of translation in a given pair of source and target sentences, and highlight the words that need corrections, without referencing to golden translations. In this paper, we propose Verdi, a novel framework for word-level and sentence-level post-editing effort estimation for bilingual corpora. Verdi adopts two word predictors to enable diverse features to be extracted from a pair of sentences for subsequent quality estimation, including a transformer-based neural machine translation (NMT) model and a pre-trained cross-lingual language model (XLM). We exploit the symmetric nature of bilingual corpora and apply model-level dual learning in the NMT predictor, which handles…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
