Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments

Amel Muminovic

arXiv:2505.18927·cs.CL·June 3, 2025

Moderating Harm: Benchmarking Large Language Models for Cyberbullying Detection in YouTube Comments

Amel Muminovic

PDF

Open Access

TL;DR

This study benchmarks three large language models for detecting cyberbullying in YouTube comments across multiple languages, revealing their strengths and weaknesses to improve moderation tools.

Contribution

It provides a comprehensive comparison of GPT-4.1, Gemini 1.5 Pro, and Claude 3 Opus on multilingual cyberbullying detection, including a publicly available dataset and prompts.

Findings

01

GPT-4.1 achieved the highest F1 score of 0.863.

02

Gemini had the highest recall of 0.875 but lower precision.

03

Claude had the highest precision of 0.920 and lowest false positives.

Abstract

As online platforms grow, comment sections increasingly host harassment that undermines user experience and well-being. This study benchmarks three leading large language models, OpenAI GPT-4.1, Google Gemini 1.5 Pro, and Anthropic Claude 3 Opus, on a corpus of 5,080 YouTube comments sampled from high-abuse threads in gaming, lifestyle, food vlog, and music channels. The dataset comprises 1,334 harmful and 3,746 non-harmful messages in English, Arabic, and Indonesian, annotated independently by two reviewers with substantial agreement (Cohen's kappa = 0.83). Using a unified prompt and deterministic settings, GPT-4.1 achieved the best overall balance with an F1 score of 0.863, precision of 0.887, and recall of 0.841. Gemini flagged the highest share of harmful posts (recall = 0.875) but its precision fell to 0.767 due to frequent false positives. Claude delivered the highest precision at…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Bullying, Victimization, and Aggression · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Layer Normalization · Byte Pair Encoding · Residual Connection · Dense Connections · Softmax · Position-Wise Feed-Forward Layer · Absolute Position Encodings · Label Smoothing