ZhuJiu: A Multi-dimensional, Multi-faceted Chinese Benchmark for Large Language Models
Baoli Zhang, Haining Xie, Pengfan Du, Junhao Chen, Pengfei Cao, Yubo, Chen, Shengping Liu, Kang Liu, Jun Zhao

TL;DR
ZhuJiu is a comprehensive Chinese benchmark for evaluating large language models across multiple abilities, using diverse methods, and addressing data leakage, to provide accurate and systematic assessment results.
Contribution
We introduce ZhuJiu, a multi-dimensional, multi-faceted Chinese benchmark for LLM evaluation, covering 7 ability dimensions and 51 tasks, with new focus on knowledge ability.
Findings
Evaluated 10 mainstream LLMs with ZhuJiu benchmark.
Demonstrated the effectiveness of multi-method evaluation approach.
Provided publicly accessible benchmark and leaderboard.
Abstract
The unprecedented performance of large language models (LLMs) requires comprehensive and accurate evaluation. We argue that for LLMs evaluation, benchmarks need to be comprehensive and systematic. To this end, we propose the ZhuJiu benchmark, which has the following strengths: (1) Multi-dimensional ability coverage: We comprehensively evaluate LLMs across 7 ability dimensions covering 51 tasks. Especially, we also propose a new benchmark that focuses on knowledge ability of LLMs. (2) Multi-faceted evaluation methods collaboration: We use 3 different yet complementary evaluation methods to comprehensively evaluate LLMs, which can ensure the authority and accuracy of the evaluation results. (3) Comprehensive Chinese benchmark: ZhuJiu is the pioneering benchmark that fully assesses LLMs in Chinese, while also providing equally robust evaluation abilities in English. (4) Avoiding potential…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification
