ChineseHarm-Bench: A Chinese Harmful Content Detection Benchmark
Kangwei Liu, Siyuan Cheng, Bozhong Tian, Xiaozhuan Liang, Yuyang Yin, Meng Han, Ningyu Zhang, Bryan Hooi, Xi Chen, Shumin Deng

TL;DR
This paper introduces ChineseHarm-Bench, a comprehensive benchmark for detecting harmful content in Chinese, including a knowledge rule base and a knowledge-augmented baseline to improve detection performance.
Contribution
It provides the first large-scale, annotated Chinese harmful content dataset with explicit knowledge rules and a novel baseline integrating human knowledge and LLMs.
Findings
The benchmark covers six harmful content categories.
The knowledge-augmented baseline improves detection accuracy.
Code and data are publicly available.
Abstract
Large language models (LLMs) have been increasingly applied to automated harmful content detection tasks, assisting moderators in identifying policy violations and improving the overall efficiency and accuracy of content review. However, existing resources for harmful content detection are predominantly focused on English, with Chinese datasets remaining scarce and often limited in scope. We present a comprehensive, professionally annotated benchmark for Chinese content harm detection, which covers six representative categories and is constructed entirely from real-world data. Our annotation process further yields a knowledge rule base that provides explicit expert knowledge to assist LLMs in Chinese harmful content detection. In addition, we propose a knowledge-augmented baseline that integrates both human-annotated knowledge rules and implicit knowledge from large language models,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Sentiment Analysis and Opinion Mining
MethodsBalanced Selection
