Chinese Cyberbullying Detection: Dataset, Method, and Validation

Yi Zhu; Xin Zou; Xindong Wu

arXiv:2505.20654·cs.CL·May 12, 2026

Chinese Cyberbullying Detection: Dataset, Method, and Validation

Yi Zhu, Xin Zou, Xindong Wu

PDF

TL;DR

This paper introduces a new Chinese cyberbullying incident dataset, along with a novel annotation method and evaluation criteria, to improve detection and prediction of cyberbullying incidents in social media comments.

Contribution

It presents the first Chinese cyberbullying incident dataset, a combined detection approach using ensemble pseudo-labels, and validation criteria for incident classification.

Findings

01

The dataset contains 220,676 comments across 91 incidents.

02

Ensemble detection methods effectively generate pseudo labels for annotation.

03

The dataset serves as a benchmark for cyberbullying incident detection and prediction.

Abstract

Existing cyberbullying detection benchmarks were organized by the polarity of speech, such as "offensive" and "non-offensive", which were essentially hate speech detection. However, in the real world, cyberbullying often attracted widespread social attention through incidents. To address this problem, we propose a novel annotation method to construct a cyberbullying dataset that organized by incidents. The constructed CHNCI is the first Chinese cyberbullying incident detection dataset, which consists of 220,676 comments in 91 incidents. Specifically, we first combine three cyberbullying detection methods based on explanations generation as an ensemble method to generate the pseudo labels, and then let human annotators judge these labels. Then we propose the evaluation criteria for validating whether it constitutes a cyberbullying incident. Experimental results demonstrate that the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.