Domain Adaptation of Machine Translation with Crowdworkers
Makoto Morishita, Jun Suzuki, Masaaki Nagata

TL;DR
This paper presents a framework that uses crowdworkers to efficiently collect domain-specific parallel data from the web, enabling quick adaptation of machine translation models to new domains with significant improvements in translation quality.
Contribution
It introduces a novel crowdworker-based data collection method for domain adaptation in machine translation, reducing time and cost compared to traditional approaches.
Findings
Collected domain-specific data in days at reasonable cost
Achieved an average BLEU score improvement of +7.8 points
Improved translation quality across five different domains
Abstract
Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
