Evaluating the Generation Capabilities of Large Chinese Language Models

Hui Zeng; Jingyuan Xue; Meng Hao; Chen Sun; Bin Ning; Na Zhang

arXiv:2308.04823·cs.CL·January 31, 2024

Evaluating the Generation Capabilities of Large Chinese Language Models

Hui Zeng, Jingyuan Xue, Meng Hao, Chen Sun, Bin Ning, Na Zhang

PDF

Open Access 2 Repos 2 Datasets

TL;DR

This paper introduces CG-Eval, an automated framework for assessing large Chinese language models across multiple academic domains, featuring the novel Gscore metric for comprehensive performance evaluation.

Contribution

It presents the first automated, multi-domain evaluation framework for Chinese language models and introduces Gscore, a new composite index for nuanced performance measurement.

Findings

01

Demonstrates the effectiveness of CG-Eval across six key domains.

02

Provides a comparative analysis of different Chinese language models.

03

Offers accessible detailed test data and results online.

Abstract

This paper unveils CG-Eval, the first-ever comprehensive and automated evaluation framework designed for assessing the generative capabilities of large Chinese language models across a spectrum of academic disciplines. CG-Eval stands out for its automated process, which critically assesses models based on their proficiency in generating precise and contextually relevant responses to a diverse array of questions within six key domains: Science and Engineering, Humanities and Social Sciences, Mathematical Calculations, Medical Practitioner Qualification Examination, Judicial Examination, and Certified Public Accountant Examination. Alongside this, we introduce Gscore, an innovative composite index developed from a weighted sum of multiple metrics. Gscore uniquely automates the quality measurement of a model's text generation against reference standards, providing a detailed and nuanced…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling