HintedBT: Augmenting Back-Translation with Quality and Transliteration Hints
Sahana Ramnath, Melvin Johnson, Abhirut Gupta, Aravindan Raghuveer

TL;DR
This paper introduces HintedBT, a technique that enhances back-translation for neural machine translation by providing quality and transliteration hints, leading to improved performance especially in low-resource, cross-script language pairs.
Contribution
It proposes a novel hint-based approach for leveraging noisy back-translation data and transliteration information, achieving state-of-the-art results in low-resource language translation.
Findings
Hints improve translation quality significantly.
Model effectively learns from noisy data with hints.
State-of-the-art performance on low-resource language pairs.
Abstract
Back-translation (BT) of target monolingual corpora is a widely used data augmentation strategy for neural machine translation (NMT), especially for low-resource language pairs. To improve effectiveness of the available BT data, we introduce HintedBT -- a family of techniques which provides hints (through tags) to the encoder and decoder. First, we propose a novel method of using both high and low quality BT data by providing hints (as source tags on the encoder) to the model about the quality of each source-target pair. We don't filter out low quality data but instead show that these hints enable the model to learn effectively from noisy data. Second, we address the problem of predicting whether a source token needs to be translated or transliterated to the target language, which is common in cross-script translation tasks (i.e., where source and target do not share the written…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
