MALM: Mixing Augmented Language Modeling for Zero-Shot Machine Translation
Kshitij Gupta

TL;DR
This paper introduces MALM, a method that enhances zero-shot multilingual machine translation by combining prompt conditioning, self-supervised pre-training, and data augmentation, reducing off-target language errors.
Contribution
It demonstrates that prompt conditioned large models effectively mitigate off-target language errors in zero-shot translation, leveraging self-supervised pre-training and data augmentation.
Findings
Prompt conditioned models do not suffer from off-target language errors.
Self-supervised pre-training improves zero-shot translation quality.
Data augmentation enhances multilingual translation performance.
Abstract
Large pre-trained language models have brought remarkable progress in NLP. Pre-training and Fine-tuning have given state-of-art performance across tasks in text processing. Data Augmentation techniques have also helped build state-of-art models on low or zero resource tasks. Many works in the past have attempted at learning a single massively-multilingual machine translation model for zero-shot translation. Although those translation models are producing correct translations, the main challenge is those models are producing the wrong languages for zero-shot translation. This work and its results indicate that prompt conditioned large models do not suffer from off-target language errors i.e. errors arising due to translation to wrong languages. We empirically demonstrate the effectiveness of self-supervised pre-training and data augmentation for zero-shot multi-lingual machine…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
