ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language   Models

Hengxiang Zhang; Hongfu Gao; Qiang Hu; Guanhua Chen; Lili Yang; Bingyi; Jing; Hongxin Wei; Bing Wang; Haifeng Bai; Lei Yang

arXiv:2410.18491·cs.CL·April 15, 2025

ChineseSafe: A Chinese Benchmark for Evaluating Safety in Large Language Models

Hengxiang Zhang, Hongfu Gao, Qiang Hu, Guanhua Chen, Lili Yang, Bingyi, Jing, Hongxin Wei, Bing Wang, Haifeng Bai, Lei Yang

PDF

Open Access 3 Datasets

TL;DR

ChineseSafe is a comprehensive benchmark designed to evaluate the safety of large language models in Chinese, focusing on illegal and unsafe content detection, including politically sensitive and pornographic material, to improve content moderation and legal compliance.

Contribution

This work introduces ChineseSafe, a large Chinese safety benchmark with over 200,000 examples, addressing the lack of Chinese-specific safety evaluation for LLMs and including new illegal content categories.

Findings

01

Many LLMs are vulnerable to safety issues in Chinese contexts.

02

Current models pose legal risks due to safety vulnerabilities.

03

Benchmark results guide safer LLM development.

Abstract

With the rapid development of Large language models (LLMs), understanding the capabilities of LLMs in identifying unsafe content has become increasingly important. While previous works have introduced several benchmarks to evaluate the safety risk of LLMs, the community still has a limited understanding of current LLMs' capability to recognize illegal and unsafe content in Chinese contexts. In this work, we present a Chinese safety benchmark (ChineseSafe) to facilitate research on the content safety of large language models. To align with the regulations for Chinese Internet content moderation, our ChineseSafe contains 205,034 examples across 4 classes and 10 sub-classes of safety issues. For Chinese contexts, we add several special types of illegal content: political sensitivity, pornography, and variant/homophonic words. Moreover, we employ two methods to evaluate the legal risks of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsALIGN