Advancing the Evaluation of Traditional Chinese Language Models: Towards a Comprehensive Benchmark Suite
Chan-Jan Hsu, Chang-Le Liu, Feng-Ting Liao, Po-Chun Hsu, Yi-Chang, Chen, Da-shan Shiu

TL;DR
This paper introduces a comprehensive benchmark suite for evaluating Traditional Chinese language models, covering diverse tasks and enabling fair comparison of models like GPT-3.5, Taiwan-LLaMa, and proprietary models.
Contribution
It presents a new set of benchmarks tailored for Traditional Chinese, leveraging existing datasets and covering multiple tasks, which was previously lacking in the field.
Findings
Model 7-C performs comparably to GPT-3.5 on several tasks.
The benchmark suite is open-sourced for community use.
Evaluation across diverse tasks highlights strengths and weaknesses of models.
Abstract
The evaluation of large language models is an essential task in the field of language understanding and generation. As language models continue to advance, the need for effective benchmarks to assess their performance has become imperative. In the context of Traditional Chinese, there is a scarcity of comprehensive and diverse benchmarks to evaluate the capabilities of language models, despite the existence of certain benchmarks such as DRCD, TTQA, CMDQA, and FGC dataset. To address this gap, we propose a novel set of benchmarks that leverage existing English datasets and are tailored to evaluate language models in Traditional Chinese. These benchmarks encompass a wide range of tasks, including contextual question-answering, summarization, classification, and table understanding. The proposed benchmarks offer a comprehensive evaluation framework, enabling the assessment of language…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Computational and Text Analysis Methods
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Attention Is All You Need · Softmax · Dense Connections · Linear Layer · Attention Dropout · Residual Connection · Adam
