Advancing the Evaluation of Traditional Chinese Language Models: Towards   a Comprehensive Benchmark Suite

Chan-Jan Hsu; Chang-Le Liu; Feng-Ting Liao; Po-Chun Hsu; Yi-Chang; Chen; Da-shan Shiu

arXiv:2309.08448·cs.CL·October 3, 2023·1 cites

Advancing the Evaluation of Traditional Chinese Language Models: Towards a Comprehensive Benchmark Suite

Chan-Jan Hsu, Chang-Le Liu, Feng-Ting Liao, Po-Chun Hsu, Yi-Chang, Chen, Da-shan Shiu

PDF

Open Access 1 Repo 1 Datasets

TL;DR

This paper introduces a comprehensive benchmark suite for evaluating Traditional Chinese language models, covering diverse tasks and enabling fair comparison of models like GPT-3.5, Taiwan-LLaMa, and proprietary models.

Contribution

It presents a new set of benchmarks tailored for Traditional Chinese, leveraging existing datasets and covering multiple tasks, which was previously lacking in the field.

Findings

01

Model 7-C performs comparably to GPT-3.5 on several tasks.

02

The benchmark suite is open-sourced for community use.

03

Evaluation across diverse tasks highlights strengths and weaknesses of models.

Abstract

The evaluation of large language models is an essential task in the field of language understanding and generation. As language models continue to advance, the need for effective benchmarks to assess their performance has become imperative. In the context of Traditional Chinese, there is a scarcity of comprehensive and diverse benchmarks to evaluate the capabilities of language models, despite the existence of certain benchmarks such as DRCD, TTQA, CMDQA, and FGC dataset. To address this gap, we propose a novel set of benchmarks that leverage existing English datasets and are tailored to evaluate language models in Traditional Chinese. These benchmarks encompass a wide range of tasks, including contextual question-answering, summarization, classification, and table understanding. The proposed benchmarks offer a comprehensive evaluation framework, enabling the assessment of language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mtkresearch/mr-models
noneOfficial

Datasets

MediaTek-Research/TCEval-v2
dataset· 1.8k dl
1.8k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Softmax · Dense Connections · Linear Layer · Attention Dropout · Residual Connection · Adam