FGraDA: A Dataset and Benchmark for Fine-Grained Domain Adaptation in Machine Translation
Wenhao Zhu, Shujian Huang, Tong Pu, Pingxuan Huang, Xu Zhang, Jian Yu,, Wei Chen, Yanfeng Wang, Jiajun Chen

TL;DR
This paper introduces FGraDA, a dataset and benchmark for fine-grained domain adaptation in machine translation, focusing on sub-domains with limited resources and no in-domain training data, highlighting ongoing challenges.
Contribution
The paper presents a new dataset and benchmark for fine-grained domain adaptation in MT, emphasizing resource scarcity and heterogeneity in real-world scenarios.
Findings
Significant performance gaps remain in fine-grained domain adaptation.
Heterogeneous resources pose challenges for current MT models.
The dataset enables targeted evaluation of domain-specific translation issues.
Abstract
Previous research for adapting a general neural machine translation (NMT) model into a specific domain usually neglects the diversity in translation within the same domain, which is a core problem for domain adaptation in real-world scenarios. One representative of such challenging scenarios is to deploy a translation system for a conference with a specific topic, e.g., global warming or coronavirus, where there are usually extremely less resources due to the limited schedule. To motivate wider investigation in such a scenario, we present a real-world fine-grained domain adaptation task in machine translation (FGraDA). The FGraDA dataset consists of Chinese-English translation task for four sub-domains of information technology: autonomous vehicles, AI education, real-time networks, and smart phone. Each sub-domain is equipped with a development set and test set for evaluation purposes.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications
