Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning

Phakphum Artkaew

arXiv:2405.18375·cs.CL·December 17, 2024

Thai Winograd Schemas: A Benchmark for Thai Commonsense Reasoning

Phakphum Artkaew

PDF

Open Access 1 Repo

TL;DR

This paper introduces Thai Winograd Schemas, a new benchmark dataset for evaluating commonsense reasoning in Thai, highlighting the challenges faced by current large language models in multilingual understanding.

Contribution

It presents the first Thai-specific Winograd Schemas dataset, developed through native speaker validation, to assess and improve Thai language commonsense reasoning.

Findings

01

Large language models perform worse on Thai schemas compared to English.

02

Models like GPT-4 show significant accuracy drops in Thai.

03

The benchmark reveals gaps in multilingual commonsense reasoning capabilities.

Abstract

Commonsense reasoning is one of the important aspect of natural language understanding, with several benchmarks developed to evaluate it. However, only a few of these benchmarks are available in languages other than English. Developing parallel benchmarks facilitates cross-lingual evaluation, enabling a better understanding of different languages. This research introduces a collection of Winograd Schemas in Thai, a novel dataset designed to evaluate commonsense reasoning capabilities in the context of the Thai language. Through a methodology involving native speakers, professional translators, and thorough validation, the schemas aim to closely reflect Thai language nuances, idioms, and cultural references while maintaining ambiguity and commonsense challenges. We evaluate the performance of popular large language models on this benchmark, revealing their strengths, limitations, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

PhakphumAdev/Thai-Winograd
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLogic, Reasoning, and Knowledge

MethodsAttention Is All You Need · Linear Layer · Byte Pair Encoding · Label Smoothing · Adam · Residual Connection · Position-Wise Feed-Forward Layer · Multi-Head Attention · Dropout · Dense Connections