What talking you?: Translating Code-Mixed Messaging Texts to English
Lynnette Hui Xian Ng, Luo Qi Chan

TL;DR
This paper investigates translating Singlish, a colloquial Singaporean English with code-mixed Asian languages, into formal English using large language models, highlighting challenges and providing a dataset for future research.
Contribution
The study analyzes the limitations of LLMs in translating Singlish and releases a new dataset to facilitate further research in code-mixed language translation.
Findings
LLMs perform poorly on Singlish translation tasks.
Code-mixed languages pose significant challenges for current translation models.
A new dataset for Singlish translation is released for future work.
Abstract
Translation of code-mixed texts to formal English allow a wider audience to understand these code-mixed languages, and facilitate downstream analysis applications such as sentiment analysis. In this work, we look at translating Singlish, which is colloquial Singaporean English, to formal standard English. Singlish is formed through the code-mixing of multiple Asian languages and dialects. We analysed the presence of other Asian languages and variants which can facilitate translation. Our dataset is short message texts, written as informal communication between Singlish speakers. We use a multi-step prompting scheme on five Large Language Models (LLMs) for language detection and translation. Our analysis show that LLMs do not perform well in this task, and we describe the challenges involved in translation of code-mixed languages. We also release our dataset in this link…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDigital Communication and Language
