CANTONMT: Investigating Back-Translation and Model-Switch Mechanisms for Cantonese-English Neural Machine Translation
Kung Yin Hong, Lifeng Han, Riza Batista-Navarro, Goran Nenadic

TL;DR
This paper presents a novel Cantonese-English neural machine translation model utilizing back-translation and model-switch mechanisms, achieving competitive results against commercial systems and providing an open-source translation tool.
Contribution
It introduces a new approach combining back-translation and model-switch techniques for low-resource Cantonese-English translation, along with a new dataset and evaluation framework.
Findings
Best model (NLLB-mBART) outperforms commercial translators in automatic metrics.
Model switch mechanisms improve translation quality.
Open-source web app enables practical Cantonese-English translation.
Abstract
This paper investigates the development and evaluation of machine translation models from Cantonese to English, where we propose a novel approach to tackle low-resource language translations. The main objectives of the study are to develop a model that can effectively translate Cantonese to English and evaluate it against state-of-the-art commercial models. To achieve this, a new parallel corpus has been created by combining different available corpora online with preprocessing and cleaning. In addition, a monolingual Cantonese dataset has been created through web scraping to aid the synthetic parallel corpus generation. Following the data collection process, several approaches, including fine-tuning models, back-translation, and model switch, have been used. The translation quality of models has been evaluated with multiple quality metrics, including lexicon-based metrics (SacreBLEU…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
