ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

Kangwei Liu; Siyuan Cheng; Bozhong Tian; Xiaozhuan Liang; Yuyang Yin; Meng Han; Ningyu Zhang; Bryan Hooi; Xi Chen; Shumin Deng

arXiv:2506.10960·cs.CL·August 14, 2025

ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark

Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng

PDF

Open Access 1 Repo 3 Models 1 Datasets

TL;DR

This paper introduces ChineseHarm-Bench, a comprehensive benchmark for detecting harmful content in Chinese, including a knowledge rule base and a knowledge-augmented baseline to improve detection performance.

Contribution

It provides the first large-scale, annotated Chinese harmful content dataset with explicit knowledge rules and a novel baseline integrating human knowledge and LLMs.

Findings

01

The benchmark covers six harmful content categories.

02

The knowledge-augmented baseline improves detection accuracy.

03

Code and data are publicly available.

Abstract

Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zjunlp/chineseharm-bench
pytorchOfficial

Models

Datasets

zjunlp/ChineseHarm-bench
dataset· 101 dl
101 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining

MethodsBalanced Selection