Tailoring Domain Adaptation for Machine Translation Quality Estimation
Javad Pourmostafa Roshan Sharami, Dimitar Shterionov, Fr\'ed\'eric, Blain, Eva Vanmassenhove, Mirella De Sisto, Chris Emmery, Pieter Spronck

TL;DR
This paper presents a domain adaptation and data augmentation approach for quality estimation in machine translation, improving performance across domains, languages, and in zero-shot scenarios.
Contribution
It introduces a method that combines domain adaptation with data augmentation to enhance QE models' generalizability and effectiveness.
Findings
Significant performance improvements across multiple language pairs.
Enhanced cross-lingual inference capabilities.
Superior results in zero-shot learning scenarios.
Abstract
While quality estimation (QE) can play an important role in the translation process, its effectiveness relies on the availability and quality of training data. For QE in particular, high-quality labeled data is often lacking due to the high cost and effort associated with labeling such data. Aside from the data scarcity challenge, QE models should also be generalizable, i.e., they should be able to handle data from different domains, both generic and specific. To alleviate these two main issues -- data scarcity and domain mismatch -- this paper combines domain adaptation and data augmentation within a robust QE system. Our method first trains a generic QE model and then fine-tunes it on a specific domain while retaining generic knowledge. Our results show a significant improvement for all the language pairs investigated, better cross-lingual inference, and a superior performance in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Interpreting and Communication in Healthcare · Topic Modeling
