Measuring Taiwanese Mandarin Language Understanding
Po-Heng Chen, Sijia Cheng, Wei-Lin Chen, Yen-Ting Lin, Yun-Nung Chen

TL;DR
This paper introduces TMLU, a comprehensive benchmark for evaluating Taiwanese Mandarin language understanding in LLMs, highlighting performance gaps and fostering development of localized models.
Contribution
It presents TMLU, a new evaluation suite for Taiwanese Mandarin LLMs, including diverse subjects, reasoning tasks, and a baseline analysis of 24 models.
Findings
Chinese open-weight models perform worse than multilingual proprietary models.
Taiwanese Mandarin models lag behind Simplified Chinese counterparts.
Significant room for improvement in Taiwanese Mandarin LLMs.
Abstract
The evaluation of large language models (LLMs) has drawn substantial attention in the field recently. This work focuses on evaluating LLMs in a Chinese context, specifically, for Traditional Chinese which has been largely underrepresented in existing benchmarks. We present TMLU, a holistic evaluation suit tailored for assessing the advanced knowledge and reasoning capability in LLMs, under the context of Taiwanese Mandarin. TMLU consists of an array of 37 subjects across social science, STEM, humanities, Taiwan-specific content, and others, ranging from middle school to professional levels. In addition, we curate chain-of-thought-like few-shot explanations for each subject to facilitate the evaluation of complex reasoning skills. To establish a comprehensive baseline, we conduct extensive experiments and analysis on 24 advanced LLMs. The results suggest that Chinese open-weight models…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗yentinglin/Llama-3-Taiwan-70B-Instructmodel· 253 dl· ♡ 94253 dl♡ 94
- 🤗yentinglin/Llama-3-Taiwan-70B-Instruct-DPOmodel· 102 dl· ♡ 9102 dl♡ 9
- 🤗yentinglin/Llama-3-Taiwan-70B-Instruct-128kmodel· 85 dl· ♡ 785 dl♡ 7
- 🤗yentinglin/Llama-3-Taiwan-8B-Instructmodel· 1.6k dl· ♡ 861.6k dl♡ 86
- 🤗yentinglin/Llama-3-Taiwan-8B-Instruct-DPOmodel· 7 dl· ♡ 57 dl♡ 5
- 🤗yentinglin/Llama-3-Taiwan-8B-Instruct-128kmodel· 76 dl· ♡ 1176 dl♡ 11
- 🤗chienweichang/Llama-3-Taiwan-8B-Instruct-128k-GGUFmodel· 1.1k dl· ♡ 41.1k dl♡ 4
- 🤗chienweichang/Llama-3-Taiwan-8B-Instruct-GGUFmodel· 401 dl· ♡ 4401 dl♡ 4
- 🤗chienweichang/Llama-3-Taiwan-8B-Instruct-DPO-GGUFmodel· 510 dl· ♡ 2510 dl♡ 2
- 🤗chienweichang/Llama-3-Taiwan-70B-Instruct-GGUFmodel· 312 dl· ♡ 5312 dl♡ 5
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques
