ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large   Language Models

Baoli Zhang; Haining Xie; Pengfan Du; Junhao Chen; Pengfei Cao; Yubo; Chen; Shengping Liu; Kang Liu; Jun Zhao

arXiv:2308.14353·cs.CL·August 29, 2023

ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models

Baoli Zhang, Haining Xie, Pengfan Du, Junhao Chen, Pengfei Cao, Yubo, Chen, Shengping Liu, Kang Liu, Jun Zhao

PDF

Open Access

TL;DR

ZhuJiu is a comprehensive Chinese benchmark for evaluating large language models across multiple abilities, using diverse methods, and addressing data leakage, to provide accurate and systematic assessment results.

Contribution

We introduce ZhuJiu, a multi-dimensional, multi-faceted Chinese benchmark for LLM evaluation, covering 7 ability dimensions and 51 tasks, with new focus on knowledge ability.

Findings

01

Evaluated 10 mainstream LLMs with ZhuJiu benchmark.

02

Demonstrated the effectiveness of multi-method evaluation approach.

03

Provided publicly accessible benchmark and leaderboard.

Abstract

The unprecedented performance of large language models (LLMs) requires comprehensive and accurate evaluation. We argue that for LLMs evaluation, benchmarks need to be comprehensive and systematic. To this end, we propose the ZhuJiu benchmark, which has the following strengths: (1) Multi-dimensional ability coverage: We comprehensively evaluate LLMs across 7 ability dimensions covering 51 tasks. Especially, we also propose a new benchmark that focuses on knowledge ability of LLMs. (2) Multi-faceted evaluation methods collaboration: We use 3 different yet complementary evaluation methods to comprehensively evaluate LLMs, which can ensure the authority and accuracy of the evaluation results. (3) Comprehensive Chinese benchmark: ZhuJiu is the pioneering benchmark that fully assesses LLMs in Chinese, while also providing equally robust evaluation abilities in English. (4) Avoiding potential…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification