Safe at the Margins: A General Approach to Safety Alignment in   Low-Resource English Languages -- A Singlish Case Study

Isaac Lim; Shaun Khoo; Roy Ka-Wei Lee; Watson Chua; Jia Yi Goh,; Jessica Foo

arXiv:2502.12485·cs.CL·April 9, 2025

Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages -- A Singlish Case Study

Isaac Lim, Shaun Khoo, Roy Ka-Wei Lee, Watson Chua, Jia Yi Goh,, Jessica Foo

PDF

Open Access 1 Models 1 Video

TL;DR

This paper presents a scalable safety alignment framework for low-resource languages like Singlish, demonstrating that combining SFT and KTO significantly reduces toxicity while maintaining benchmark performance.

Contribution

It introduces KTO-S and systematically compares safety alignment methods, highlighting superior sample efficiency and toxicity reduction in low-resource language settings.

Findings

01

SFT+KTO achieves higher safety alignment efficiency.

02

KTO-S improves stability with KL divergence regularization.

03

Reduces Singlish toxicity by 99% and generalizes to other datasets.

Abstract

Ensuring the safety of Large Language Models (LLMs) in diverse linguistic settings remains challenging, particularly for low-resource languages. Existing safety alignment methods are English-centric, limiting their effectiveness. We systematically compare Supervised Fine-Tuning (SFT), Direct Preference Optimization (DPO), and Kahneman-Tversky Optimization (KTO) for aligning SEA-Lion-v2.1-Instruct, a Llama 3-8B variant, to reduce toxicity in Singlish. Our results show that SFT+KTO achieves superior safety alignment with higher sample efficiency than DPO. Additionally, we introduce KTO-S, which enhances stability via improved KL divergence regularization. Our approach reduces Singlish toxicity by 99\%, generalizes to TOXIGEN, and maintains strong performance on standard LLM benchmarks, providing a scalable framework for safer AI deployment in multilingual contexts.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
govtech/llama3-8b-sea-lionv2.1-instruct-secure
model· 8 dl· ♡ 2
8 dl♡ 2

Videos

Safe at the Margins: A General Approach to Safety Alignment in Low-Resource English Languages – A Singlish Case Study· underline

Taxonomy

TopicsNatural Language Processing Techniques