Exploring Text-to-Text Transformers for English to Hinglish Machine   Translation with Synthetic Code-Mixing

Ganesh Jawahar; El Moatez Billah Nagoudi; Muhammad Abdul-Mageed; Laks; V.S. Lakshmanan

arXiv:2105.08807·cs.CL·May 20, 2021

Exploring Text-to-Text Transformers for English to Hinglish Machine Translation with Synthetic Code-Mixing

Ganesh Jawahar, El Moatez Billah Nagoudi, Muhammad Abdul-Mageed, Laks, V.S. Lakshmanan

PDF

TL;DR

This paper explores using Transformer-based models, especially mT5, for English to Hinglish translation, introducing synthetic code-mixing data generation and curriculum learning to improve performance in low-resource settings.

Contribution

It proposes a novel synthetic code-mixing method and a curriculum learning approach to enhance Transformer models for English-Hinglish translation.

Findings

01

mT5 with curriculum learning achieves 12.67 BLEU score

02

Synthetic code-mixing data is competitive with traditional methods

03

Models ranked first in the English-Hinglish shared task

Abstract

We describe models focused at the understudied problem of translating between monolingual and code-mixed language pairs. More specifically, we offer a wide range of models that convert monolingual English text into Hinglish (code-mixed Hindi and English). Given the recent success of pretrained language models, we also test the utility of two recent Transformer-based encoder-decoder models (i.e., mT5 and mBART) on the task finding both to work well. Given the paucity of training data for code-mixing, we also propose a dependency-free method for generating code-mixed texts from bilingual distributed representations that we exploit for improving language model performance. In particular, armed with this additional data, we adopt a curriculum learning approach where we first finetune the language models on synthetic data then on gold code-mixed data. We find that, although simple, our…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Dense Connections · Attention Dropout · Multi-Head Attention · Gated Linear Unit · SentencePiece · Dropout · Adafactor