CLEVA: Chinese Language Models EVAluation Platform
Yanyang Li, Jianqiao Zhao, Duo Zheng, Zi-Yuan Hu, Zhi Chen, Xiaohui, Su, Yongfeng Huang, Shijia Huang, Dahua Lin, Michael R. Lyu, Liwei Wang

TL;DR
CLEVA is a comprehensive, user-friendly platform designed to evaluate Chinese language models across multiple dimensions, addressing challenges like standardization, contamination, and comparability in current Chinese LLM assessments.
Contribution
CLEVA introduces a standardized evaluation workflow, a regularly updated leaderboard, and contamination mitigation strategies for Chinese LLMs, enhancing fairness and comprehensiveness.
Findings
Validated with 23 Chinese LLMs demonstrating effectiveness.
Provides a standardized, easy-to-use evaluation process.
Ensures data freshness and reduces contamination risks.
Abstract
With the continuous emergence of Chinese Large Language Models (LLMs), how to evaluate a model's capabilities has become an increasingly significant issue. The absence of a comprehensive Chinese benchmark that thoroughly assesses a model's performance, the unstandardized and incomparable prompting procedure, and the prevalent risk of contamination pose major challenges in the current evaluation of Chinese LLMs. We present CLEVA, a user-friendly platform crafted to holistically evaluate Chinese LLMs. Our platform employs a standardized workflow to assess LLMs' performance across various dimensions, regularly updating a competitive leaderboard. To alleviate contamination, CLEVA curates a significant proportion of new data and develops a sampling strategy that guarantees a unique subset for each leaderboard round. Empowered by an easy-to-use interface that requires just a few mouse clicks…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
