Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning
Phakphum Artkaew

TL;DR
This paper introduces Thai Winograd Schemas, a new benchmark dataset for evaluating commonsense reasoning in Thai, highlighting the challenges faced by current large language models in multilingual understanding.
Contribution
It presents the first Thai-specific Winograd Schemas dataset, developed through native speaker validation, to assess and improve Thai language commonsense reasoning.
Findings
Large language models perform worse on Thai schemas compared to English.
Models like GPT-4 show significant accuracy drops in Thai.
The benchmark reveals gaps in multilingual commonsense reasoning capabilities.
Abstract
Commonsense reasoning is one of the important aspect of natural language understanding, with several benchmarks developed to evaluate it. However, only a few of these benchmarks are available in languages other than English. Developing parallel benchmarks facilitates cross-lingual evaluation, enabling a better understanding of different languages. This research introduces a collection of Winograd Schemas in Thai, a novel dataset designed to evaluate commonsense reasoning capabilities in the context of the Thai language. Through a methodology involving native speakers, professional translators, and thorough validation, the schemas aim to closely reflect Thai language nuances, idioms, and cultural references while maintaining ambiguity and commonsense challenges. We evaluate the performance of popular large language models on this benchmark, revealing their strengths, limitations, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsLogic, Reasoning, and Knowledge
MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections
