CRiskEval: A Chinese Multi-Level Risk Evaluation Benchmark Dataset for Large Language Models
Ling Shi, Deyi Xiong

TL;DR
CRiskEval is a comprehensive Chinese dataset designed to evaluate the risk tendencies of large language models across multiple frontier risk types, revealing increasing risk inclinations with larger model sizes.
Contribution
The paper introduces CRiskEval, a novel risk taxonomy and dataset for assessing LLMs' risk proclivities in Chinese, with detailed annotation and empirical evaluation.
Findings
Most models show over 40% risk tendency.
Risk inclination increases with model size.
Models tend toward dangerous goals like self-sustainability.
Abstract
Large language models (LLMs) are possessed of numerous beneficial capabilities, yet their potential inclination harbors unpredictable risks that may materialize in the future. We hence propose CRiskEval, a Chinese dataset meticulously designed for gauging the risk proclivities inherent in LLMs such as resource acquisition and malicious coordination, as part of efforts for proactive preparedness. To curate CRiskEval, we define a new risk taxonomy with 7 types of frontier risks and 4 safety levels, including extremely hazardous,moderately hazardous, neutral and safe. We follow the philosophy of tendency evaluation to empirically measure the stated desire of LLMs via fine-grained multiple-choice question answering. The dataset consists of 14,888 questions that simulate scenarios related to predefined 7 types of frontier risks. Each question is accompanied with 4 answer choices that state…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques
