CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for   Large Language Models

Ling Shi; Deyi Xiong

arXiv:2406.04752·cs.CL·June 10, 2024

CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models

Ling Shi, Deyi Xiong

PDF

Open Access

TL;DR

CRiskEval is a comprehensive Chinese dataset designed to evaluate the risk tendencies of large language models across multiple frontier risk types, revealing increasing risk inclinations with larger model sizes.

Contribution

The paper introduces CRiskEval, a novel risk taxonomy and dataset for assessing LLMs' risk proclivities in Chinese, with detailed annotation and empirical evaluation.

Findings

01

Most models show over 40% risk tendency.

02

Risk inclination increases with model size.

03

Models tend toward dangerous goals like self-sustainability.

Abstract

Large language models (LLMs) are possessed of numerous beneficial capabilities, yet their potential inclination harbors unpredictable risks that may materialize in the future. We hence propose CRiskEval, a Chinese dataset meticulously designed for gauging the risk proclivities inherent in LLMs such as resource acquisition and malicious coordination, as part of efforts for proactive preparedness. To curate CRiskEval, we define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe. We follow the philosophy of tendency evaluation to empirically measure the stated desire of LLMs via fine-grained multiple-choice question answering. The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks. Each question is accompanied with 4 answer choices that state…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques