WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai

Peerat Limkonchotiwat; Pume Tuchinda; Lalita Lowphansirikul; Surapon Nonesung; Panuthep Tasawong; Alham Fikri Aji; Can Udomcharoenchaikit; Sarana Nutanong

arXiv:2508.15239·cs.CL·September 22, 2025

WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai

Peerat Limkonchotiwat, Pume Tuchinda, Lalita Lowphansirikul, Surapon Nonesung, Panuthep Tasawong, Alham Fikri Aji, Can Udomcharoenchaikit, Sarana Nutanong

PDF

Open Access 2 Datasets 1 Video

TL;DR

This paper introduces WangchanThaiInstruct, a culturally-aware Thai dataset for evaluating and improving instruction-following models in low-resource, domain-specific contexts, highlighting the importance of native supervision.

Contribution

It presents a new Thai instruction dataset created with rigorous quality control, enabling better evaluation and tuning of language models for culturally and professionally relevant tasks.

Findings

01

Models fine-tuned on WangchanThaiInstruct outperform translated-data models.

02

Native supervision enhances model performance in both in-domain and out-of-domain tasks.

03

Culturally grounded data is crucial for low-resource language model alignment.

Abstract

Large language models excel at instruction-following in English, but their performance in low-resource languages like Thai remains underexplored. Existing benchmarks often rely on translations, missing cultural and domain-specific nuances needed for real-world use. We present WangchanThaiInstruct, a human-authored Thai dataset for evaluation and instruction tuning, covering four professional domains and seven task types. Created through a multi-stage quality control process with annotators, domain experts, and AI researchers, WangchanThaiInstruct supports two studies: (1) a zero-shot evaluation showing performance gaps on culturally and professionally specific tasks, and (2) an instruction tuning study with ablations isolating the effect of native supervision. Models fine-tuned on WangchanThaiInstruct outperform those using translated data in both in-domain and out-of-domain benchmarks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

WangchanThaiInstruct: An instruction-following Dataset for Culture-Aware, Multitask, and Multi-domain Evaluation in Thai· underline

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification