ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech

Gautam Siddharth Kashyap; Mohammad Anas Azeez; Rafiq Ali; Zohaib Hasan Siddiqui; Jiechao Gao; and Usman Naseem

arXiv:2506.21613·cs.CL·July 29, 2025

ChildGuard: A Specialized Dataset for Combatting Child-Targeted Hate Speech

Gautam Siddharth Kashyap, Mohammad Anas Azeez, Rafiq Ali, Zohaib Hasan Siddiqui, Jiechao Gao, and Usman Naseem

PDF

Open Access

TL;DR

ChildGuard is a large, annotated dataset specifically designed to improve detection of hate speech targeting children across social media platforms, addressing limitations of previous datasets focused on adults.

Contribution

The paper introduces ChildGuard, the first extensive dataset for child-targeted hate speech, with detailed age-specific labels and analysis of model performance challenges.

Findings

01

State-of-the-art models perform poorly on ChildGuard

02

Dataset includes 351,877 examples across multiple platforms

03

Two subsets enable nuanced linguistic and contextual analysis

Abstract

Hate speech targeting children on social media is a serious and growing problem, yet current NLP systems struggle to detect it effectively. This gap exists mainly because existing datasets focus on adults, lack age specific labels, miss nuanced linguistic cues, and are often too small for robust modeling. To address this, we introduce ChildGuard, the first large scale English dataset dedicated to hate speech aimed at children. It contains 351,877 annotated examples from X (formerly Twitter), Reddit, and YouTube, labeled by three age groups: younger children (under 11), pre teens (11--12), and teens (13--17). The dataset is split into two subsets for fine grained analysis: a contextual subset (157K) focusing on discourse level features, and a lexical subset (194K) emphasizing word-level sentiment and vocabulary. Benchmarking state of the art hate speech models on ChildGuard reveals…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Bullying, Victimization, and Aggression · Spam and Phishing Detection