The University of Edinburgh's Submission to the WMT22 Code-Mixing Shared Task (MixMT)
Faheem Kirefu, Vivek Iyer, Pinzhen Chen, Laurie Burchell

TL;DR
This paper describes the University of Edinburgh's approach to the WMT22 code-mixed translation shared task, focusing on data curation, backtranslation, and pretraining techniques to improve Hinglish translation and generation.
Contribution
The paper introduces effective data generation and pretraining strategies for low-resource code-mixed translation tasks, achieving top performance in the shared task.
Findings
Baseline systems performed best across subtasks.
Careful data curation improved translation quality.
Backtranslation enhanced low-resource translation performance.
Abstract
The University of Edinburgh participated in the WMT22 shared task on code-mixed translation. This consists of two subtasks: i) generating code-mixed Hindi/English (Hinglish) text generation from parallel Hindi and English sentences and ii) machine translation from Hinglish to English. As both subtasks are considered low-resource, we focused our efforts on careful data generation and curation, especially the use of backtranslation from monolingual resources. For subtask 1 we explored the effects of constrained decoding on English and transliterated subwords in order to produce Hinglish. For subtask 2, we investigated different pretraining techniques, namely comparing simple initialisation from existing machine translation models and aligned augmentation. For both subtasks, we found that our baseline systems worked best. Our systems for both subtasks were one of the overall top-performing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Multimodal Machine Learning Applications
