From Standard English to Singlish: A Retrieval-Augmented Approach for Code-Switched Creole Generation in Large Language Models
Foong Ming Lai, Yujin Tan, Han Meng, Yi-Chieh Lee

TL;DR
This paper introduces a retrieval-augmented generation framework that externalizes dialectal knowledge into a lexicon, enabling controlled code-switching in large language models without fine-tuning.
Contribution
It presents a novel RAG approach for code-switching that leverages external lexical resources, improving control and preserving quality in dialectal language generation.
Findings
RAG performs minimal lexical substitutions with high semantic preservation.
Zero-shot prompting induces extensive paraphrasing.
Human evaluation shows RAG and zero-shot prompting are equally natural.
Abstract
Code-switching in contact varieties like Singaporean English (Singlish) challenges natural language generation due to limited parallel data and rapid lexical evolution. We propose a retrieval-augmented generation (RAG) framework that externalizes dialectal knowledge into a curated lexicon, enabling controlled lexical code-switching without fine-tuning. Our approach retrieves candidate Singlish expressions and guides generation through sparse lexical substitution. Human evaluation with 164 Singaporean participants found RAG and zero-shot prompting equally natural and appropriate. Automatic analyses reveal different transformation regimes: zero-shot prompting induces extensive paraphrasing (median 23 token edits), whereas RAG performs minimal substitutions (median 1 edit) with higher semantic preservation (mean cosine similarity 0.978 vs. 0.926). Our results demonstrate that externalizing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
