Improving Machine Translation with Phrase Pair Injection and Corpus Filtering
Akshay Batheja, Pushpak Bhattacharyya

TL;DR
This paper introduces a method combining phrase pair injection and corpus filtering to enhance neural machine translation, achieving significant BLEU score improvements for low-resource language pairs.
Contribution
It proposes a novel approach that extracts and injects phrase pairs and filters corpora to improve translation quality in low-resource settings.
Findings
Up to 2.7 BLEU point improvements on FLORES data
Effective for 3 low-resource language pairs and 6 translation directions
Enhances NMT performance using pseudo-parallel corpus augmentation
Abstract
In this paper, we show that the combination of Phrase Pair Injection and Corpus Filtering boosts the performance of Neural Machine Translation (NMT) systems. We extract parallel phrases and sentences from the pseudo-parallel corpus and augment it with the parallel corpus to train the NMT models. With the proposed approach, we observe an improvement in the Machine Translation (MT) system for 3 low-resource language pairs, Hindi-Marathi, English-Marathi, and English-Pashto, and 6 translation directions by up to 2.7 BLEU points, on the FLORES test data. These BLEU score improvements are over the models trained using the whole pseudo-parallel corpus augmented with the parallel corpus.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech Recognition and Synthesis
MethodsTest
