Exploiting Language Relatedness in Machine Translation Through Domain   Adaptation Techniques

Amit Kumar; Rupjyoti Baruah; Ajay Pratap; Mayank Swarnkar; Anil; Kumar Singh

arXiv:2303.01793·cs.CL·March 6, 2023·1 cites

Exploiting Language Relatedness in Machine Translation Through Domain Adaptation Techniques

Amit Kumar, Rupjyoti Baruah, Ajay Pratap, Mayank Swarnkar, Anil, Kumar Singh

PDF

Open Access

TL;DR

This paper introduces a novel domain adaptation method leveraging language relatedness via sentence similarity scoring to improve machine translation quality for low-resource languages, demonstrated on Hindi-Nepali.

Contribution

It presents a new similarity-based filtering technique combined with domain adaptation methods to enhance translation quality in low-resource, related language pairs.

Findings

01

Improved BLEU scores by ~2 points with multi-domain adaptation.

02

Achieved ~3 BLEU point increase through fine-tuning.

03

Enhanced translation quality using iterative back-translation.

Abstract

One of the significant challenges of Machine Translation (MT) is the scarcity of large amounts of data, mainly parallel sentence aligned corpora. If the evaluation is as rigorous as resource-rich languages, both Neural Machine Translation (NMT) and Statistical Machine Translation (SMT) can produce good results with such large amounts of data. However, it is challenging to improve the quality of MT output for low resource languages, especially in NMT and SMT. In order to tackle the challenges faced by MT, we present a novel approach of using a scaled similarity score of sentences, especially for related languages based on a 5-gram KenLM language model with Kneser-ney smoothing technique for filtering in-domain data from out-of-domain corpora that boost the translation quality of MT. Furthermore, we employ other domain adaptation techniques such as multi-domain, fine-tuning and iterative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification