ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts

Trapoom Ukarapol; Nut Chukamphaeng; Kunat Pipatanakul; Pakhapoom Sarapat

arXiv:2603.04992·cs.CL·March 9, 2026

ThaiSafetyBench: Assessing Language Model Safety in Thai Cultural Contexts

Trapoom Ukarapol, Nut Chukamphaeng, Kunat Pipatanakul, Pakhapoom Sarapat

PDF

Open Access 1 Models 2 Datasets

TL;DR

This paper introduces ThaiSafetyBench, a benchmark for evaluating Thai language model safety, revealing vulnerabilities in open-source models to culturally specific attacks and proposing a fine-tuned classifier for improved safety assessment.

Contribution

The work presents ThaiSafetyBench, a culturally nuanced safety benchmark for Thai LLMs, and develops ThaiSafetyClassifier, a fine-tuned model matching GPT-4.1 judgments for safety detection.

Findings

01

Closed-source models are generally safer than open-source models.

02

Culturally grounded attacks have higher success rates, exposing safety vulnerabilities.

03

The ThaiSafetyClassifier achieves an 84.4% F1 score, matching GPT-4.1 judgments.

Abstract

The safety evaluation of large language models (LLMs) remains largely centered on English, leaving non-English languages and culturally grounded risks underexplored. In this work, we investigate LLM safety in the context of the Thai language and culture and introduce ThaiSafetyBench, an open-source benchmark comprising 1,954 malicious prompts written in Thai. The dataset covers both general harmful prompts and attacks that are explicitly grounded in Thai cultural, social, and contextual nuances. Using ThaiSafetyBench, we evaluate 24 LLMs, with GPT-4.1 and Gemini-2.5-Pro serving as LLM-as-a-judge evaluators. Our results show that closed-source models generally demonstrate stronger safety performance than open-source counterparts, raising important concerns regarding the robustness of openly available models. Moreover, we observe a consistently higher Attack Success Rate (ASR) for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
typhoon-ai/ThaiSafetyClassifier
model

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Topic Modeling · Hate Speech and Cyberbullying Detection