TWGuard: A Case Study of LLM Safety Guardrails for Localized Linguistic Contexts
Hua-Rong Chu, Kuan-Chun Wang, and Yao-Te Huang

TL;DR
This paper introduces TWGuard, a linguistic context-optimized safety guardrail for LLMs tailored to Taiwan, significantly improving safety metrics by addressing regional linguistic nuances.
Contribution
It presents a novel approach to optimize LLM safety guardrails for specific linguistic contexts using curated regional datasets, demonstrated with Taiwan.
Findings
TWGuard achieves +0.289 F1 score improvement over the baseline.
It reduces false positive rate by 94.9%, outperforming existing baselines.
The approach emphasizes regional linguistic considerations in AI safety standards.
Abstract
Safety guardrails have become an active area of research in AI safety, aimed at ensuring the appropriate behavior of large language models (LLMs). However, existing research lacks consideration of nuances across linguistic and cultural contexts, resulting in a gap between reported performance and in-the-wild effectiveness. To address this issue, this paper proposes an approach to optimize guardrail models for a designated linguistic context by leveraging a curated dataset tailored to local linguistic characteristics, targeting the Taiwan linguistic context as a representative example of localized deployment challenges. The proposed approach yields TWGuard, a linguistic context-optimized guardrail model that achieves a huge gain (+0.289 in F1) compared to the foundation model and significantly outperforms the strongest baseline in practical use (-0.037 in false positive rate, a 94.9\%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
