Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation

Ziyu Ge; Gabriel Chua; Leanne Tan; Roy Ka-Wei Lee

arXiv:2507.11966·cs.CL·July 17, 2025

Toxicity-Aware Few-Shot Prompting for Low-Resource Singlish Translation

Ziyu Ge, Gabriel Chua, Leanne Tan, Roy Ka-Wei Lee

PDF

Open Access 2 Datasets

TL;DR

This paper introduces a two-stage, toxicity-aware translation framework for low-resource, code-mixed Singlish, combining human-verified prompt engineering and model benchmarking to improve translation quality and safety.

Contribution

It presents a novel reproducible pipeline for toxicity-preserving translation in low-resource languages using few-shot prompting and model benchmarking.

Findings

01

Effective translation of Singlish with preserved slang and toxicity nuances.

02

Human evaluation confirms improved translation quality and safety.

03

Framework supports culturally sensitive moderation in low-resource settings.

Abstract

As online communication increasingly incorporates under-represented languages and colloquial dialects, standard translation systems often fail to preserve local slang, code-mixing, and culturally embedded markers of harmful speech. Translating toxic content between low-resource language pairs poses additional challenges due to scarce parallel data and safety filters that sanitize offensive expressions. In this work, we propose a reproducible, two-stage framework for toxicity-preserving translation, demonstrated on a code-mixed Singlish safety corpus. First, we perform human-verified few-shot prompt engineering: we iteratively curate and rank annotator-selected Singlish-target examples to capture nuanced slang, tone, and toxicity. Second, we optimize model-prompt pairs by benchmarking several large language models using semantic similarity via direct and back-translation. Quantitative…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques