Domain Adaptation of Machine Translation with Crowdworkers

Makoto Morishita; Jun Suzuki; Masaaki Nagata

arXiv:2210.15861·cs.CL·October 31, 2022

Domain Adaptation of Machine Translation with Crowdworkers

Makoto Morishita, Jun Suzuki, Masaaki Nagata

PDF

Open Access

TL;DR

This paper presents a framework that uses crowdworkers to efficiently collect domain-specific parallel data from the web, enabling quick adaptation of machine translation models to new domains with significant improvements in translation quality.

Contribution

It introduces a novel crowdworker-based data collection method for domain adaptation in machine translation, reducing time and cost compared to traditional approaches.

Findings

01

Collected domain-specific data in days at reasonable cost

02

Achieved an average BLEU score improvement of +7.8 points

03

Improved translation quality across five different domains

Abstract

Although a machine translation model trained with a large in-domain parallel corpus achieves remarkable results, it still works poorly when no in-domain data are available. This situation restricts the applicability of machine translation when the target domain's data are limited. However, there is great demand for high-quality domain-specific machine translation models for many domains. We propose a framework that efficiently and effectively collects parallel sentences in a target domain from the web with the help of crowdworkers. With the collected parallel data, we can quickly adapt a machine translation model to the target domain. Our experiments show that the proposed method can collect target-domain parallel data over a few days at a reasonable cost. We tested it with five domains, and the domain-adapted model improved the BLEU scores to +19.7 by an average of +7.8 points compared…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications