FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of   Large Language Models

Wei Li; Ren Ma; Jiang Wu; Chenya Gu; Jiahui Peng; Jinyang Len,; Songyang Zhang; Hang Yan; Dahua Lin; Conghui He

arXiv:2404.18359·cs.CL·April 30, 2024

FoundaBench: Evaluating Chinese Fundamental Knowledge Capabilities of Large Language Models

Wei Li, Ren Ma, Jiang Wu, Chenya Gu, Jiahui Peng, Jinyang Len,, Songyang Zhang, Hang Yan, Dahua Lin, Conghui He

PDF

Open Access

TL;DR

FoundaBench is a comprehensive benchmark for evaluating Chinese large language models' fundamental knowledge across diverse subjects, revealing performance disparities and guiding future improvements.

Contribution

This paper introduces FoundaBench, a new benchmark with 3354 questions to assess Chinese LLMs' fundamental knowledge, employing novel evaluation protocols.

Findings

01

Models trained on Chinese data outperform others.

02

Significant gap between reasoning and memory recall abilities.

03

FoundaBench sets a new standard for LLM knowledge evaluation.

Abstract

In the burgeoning field of large language models (LLMs), the assessment of fundamental knowledge remains a critical challenge, particularly for models tailored to Chinese language and culture. This paper introduces FoundaBench, a pioneering benchmark designed to rigorously evaluate the fundamental knowledge capabilities of Chinese LLMs. FoundaBench encompasses a diverse array of 3354 multiple-choice questions across common sense and K-12 educational subjects, meticulously curated to reflect the breadth and depth of everyday and academic knowledge. We present an extensive evaluation of 12 state-of-the-art LLMs using FoundaBench, employing both traditional assessment methods and our CircularEval protocol to mitigate potential biases in model responses. Our results highlight the superior performance of models pre-trained on Chinese corpora, and reveal a significant disparity between…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques

MethodsSparse Evolutionary Training