Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing
Ganesh Jawahar, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Laks, V.S. Lakshmanan

TL;DR
This paper explores using Transformer-based models, especially mT5, for English to Hinglish translation, introducing synthetic code-mixing data generation and curriculum learning to improve performance in low-resource settings.
Contribution
It proposes a novel synthetic code-mixing method and a curriculum learning approach to enhance Transformer models for English-Hinglish translation.
Findings
mT5 with curriculum learning achieves 12.67 BLEU score
Synthetic code-mixing data is competitive with traditional methods
Models ranked first in the English-Hinglish shared task
Abstract
We describe models focused at the understudied problem of translating between monolingual and code-mixed language pairs. More specifically, we offer a wide range of models that convert monolingual English text into Hinglish (code-mixed Hindi and English). Given the recent success of pretrained language models, we also test the utility of two recent Transformer-based encoder-decoder models (i.e., mT5 and mBART) on the task finding both to work well. Given the paucity of training data for code-mixing, we also propose a dependency-free method for generating code-mixed texts from bilingual distributed representations that we exploit for improving language model performance. In particular, armed with this additional data, we adopt a curriculum learning approach where we first finetune the language models on synthetic data then on gold code-mixed data. We find that, although simple, our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Attention Dropout · Multi-Head Attention · Gated Linear Unit · SentencePiece · Dropout · Adafactor
