Measuring Taiwanese Mandarin Language Understanding

Po-Heng Chen; Sijia Cheng; Wei-Lin Chen; Yen-Ting Lin; Yun-Nung Chen

arXiv:2403.20180·cs.CL·April 1, 2024·1 cites

Measuring Taiwanese Mandarin Language Understanding

Po-Heng Chen, Sijia Cheng, Wei-Lin Chen, Yen-Ting Lin, Yun-Nung Chen

PDF

Open Access 2 Repos 10 Models

TL;DR

This paper introduces TMLU, a comprehensive benchmark for evaluating Taiwanese Mandarin language understanding in LLMs, highlighting performance gaps and fostering development of localized models.

Contribution

It presents TMLU, a new evaluation suite for Taiwanese Mandarin LLMs, including diverse subjects, reasoning tasks, and a baseline analysis of 24 models.

Findings

01

Chinese open-weight models perform worse than multilingual proprietary models.

02

Taiwanese Mandarin models lag behind Simplified Chinese counterparts.

03

Significant room for improvement in Taiwanese Mandarin LLMs.

Abstract

The evaluation of large language models (LLMs) has drawn substantial attention in the field recently. This work focuses on evaluating LLMs in a Chinese context, specifically, for Traditional Chinese which has been largely underrepresented in existing benchmarks. We present TMLU, a holistic evaluation suit tailored for assessing the advanced knowledge and reasoning capability in LLMs, under the context of Taiwanese Mandarin. TMLU consists of an array of 37 subjects across social science, STEM, humanities, Taiwan-specific content, and others, ranging from middle school to professional levels. In addition, we curate chain-of-thought-like few-shot explanations for each subject to facilitate the evaluation of complex reasoning skills. To establish a comprehensive baseline, we conduct extensive experiments and analysis on 24 advanced LLMs. The results suggest that Chinese open-weight models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques