An Improved Traditional Chinese Evaluation Suite for Foundation Model

Zhi-Rui Tam; Ya-Ting Pai; Yen-Wei Lee; Jun-Da Chen; Wei-Min Chu; Sega; Cheng; Hong-Han Shuai

arXiv:2403.01858·cs.CL·July 12, 2024·2 cites

An Improved Traditional Chinese Evaluation Suite for Foundation Model

Zhi-Rui Tam, Ya-Ting Pai, Yen-Wei Lee, Jun-Da Chen, Wei-Min Chu, Sega, Cheng, Hong-Han Shuai

PDF

Open Access 2 Datasets

TL;DR

TMMLU+ is a comprehensive Traditional Chinese language understanding benchmark that reveals current LLMs lag behind human performance and highlights areas for future model improvements.

Contribution

Introduces TMMLU+ benchmark, significantly larger and more balanced, for evaluating Traditional Chinese LLMs and benchmarking their performance across diverse subjects.

Findings

01

Traditional Chinese models lag behind Simplified Chinese models.

02

Current LLMs do not match human performance.

03

Fertility score correlates strongly with benchmark results.

Abstract

We present TMMLU+, a new benchmark designed for Traditional Chinese language understanding. TMMLU+ is a multi-choice question-answering dataset with 66 subjects from elementary to professional level. It is six times larger and boasts a more balanced subject distribution than its predecessor, Taiwan Massive Multitask Language Understanding (TMMLU). We also benchmark closed-source models and 26 open-weight Chinese large language models (LLMs) of parameters ranging from 1.8B to 72B on the proposed TMMLU+. Our findings reveal that (1.) Traditional Chinese models still trail behind their Simplified Chinese counterparts, highlighting a need for more focused advancements in LLMs catering to Traditional Chinese. (2.) Current LLMs still fall short of human performance in average scores, indicating a potential need for future research to delve deeper into social science and humanities subjects.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsCivil and Geotechnical Engineering Research · Grouting, Rheology, and Soil Mechanics