Chinese Cyberbullying Detection: Dataset, Method, and Validation
Yi Zhu, Xin Zou, Xindong Wu

TL;DR
This paper introduces a new Chinese cyberbullying incident dataset, along with a novel annotation method and evaluation criteria, to improve detection and prediction of cyberbullying incidents in social media comments.
Contribution
It presents the first Chinese cyberbullying incident dataset, a combined detection approach using ensemble pseudo-labels, and validation criteria for incident classification.
Findings
The dataset contains 220,676 comments across 91 incidents.
Ensemble detection methods effectively generate pseudo labels for annotation.
The dataset serves as a benchmark for cyberbullying incident detection and prediction.
Abstract
Existing cyberbullying detection benchmarks were organized by the polarity of speech, such as "offensive" and "non-offensive", which were essentially hate speech detection. However, in the real world, cyberbullying often attracted widespread social attention through incidents. To address this problem, we propose a novel annotation method to construct a cyberbullying dataset that organized by incidents. The constructed CHNCI is the first Chinese cyberbullying incident detection dataset, which consists of 220,676 comments in 91 incidents. Specifically, we first combine three cyberbullying detection methods based on explanations generation as an ensemble method to generate the pseudo labels, and then let human annotators judge these labels. Then we propose the evaluation criteria for validating whether it constitutes a cyberbullying incident. Experimental results demonstrate that the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
