SEALGuard: Safeguarding the Multilingual Conversations in Southeast Asian Languages for LLM Software Systems
Wenliang Shan, Michael Fu, Rui Yang, Chakkrit Tantithamthavorn

TL;DR
SEALGuard is a multilingual safety guardrail for LLMs that effectively detects unsafe prompts across Southeast Asian languages, outperforming existing English-centric methods and addressing a critical safety gap.
Contribution
It introduces SEALGuard, a novel multilingual guardrail using LoRA adaptation and a large-scale multilingual safety dataset, improving safety detection in low-resource languages.
Findings
SEALGuard outperforms LlamaGuard in multilingual unsafe prompt detection.
Multilingual prompts significantly reduce LlamaGuard's effectiveness.
Model adaptation strategies enhance safety performance across languages.
Abstract
Safety alignment is critical for LLM-powered systems. While recent LLM-powered guardrail approaches such as LlamaGuard achieve high detection accuracy of unsafe inputs written in English (e.g., ``How to create a bomb?''), they struggle with multilingual unsafe inputs. This limitation leaves LLM systems vulnerable to unsafe and jailbreak prompts written in low-resource languages such as those in Southeast Asia. This paper introduces SEALGuard, a multilingual guardrail designed to improve the safety alignment across diverse languages. It aims to address the multilingual safety alignment gap of existing guardrails and ensure effective filtering of unsafe and jailbreak prompts in LLM-powered systems. We adapt a general-purpose multilingual language model into a multilingual guardrail using low-rank adaptation (LoRA). We construct SEALSBench, a large-scale multilingual safety alignment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
