Harmful YouTube Video Detection: A Taxonomy of Online Harm and MLLMs as Alternative Annotators
Claire Wonjeong Jo, Miki Weso{\l}owska, Magdalena Wojcieszak

TL;DR
This paper develops a comprehensive harm taxonomy for online videos and demonstrates that multimodal large language models, especially GPT-4-Turbo, outperform crowdworkers in identifying harmful content across multiple categories.
Contribution
It introduces a detailed harm taxonomy for videos and validates multimodal LLMs as effective annotators, advancing automated harm detection methods.
Findings
GPT-4-Turbo outperforms crowdworkers in harm detection
Multimodal LLMs effectively classify multi-label harms
The taxonomy aids in understanding and mitigating online video harms
Abstract
Short video platforms, such as YouTube, Instagram, or TikTok, are used by billions of users globally. These platforms expose users to harmful content, ranging from clickbait or physical harms to misinformation or online hate. Yet, detecting harmful videos remains challenging due to an inconsistent understanding of what constitutes harm and limited resources and mental tolls involved in human annotation. As such, this study advances measures and methods to detect harm in video content. First, we develop a comprehensive taxonomy for online harm on video platforms, categorizing it into six categories: Information, Hate and harassment, Addictive, Clickbait, Sexual, and Physical harms. Next, we establish multimodal large language models as reliable annotators of harmful videos. We analyze 19,422 YouTube videos using 14 image frames, 1 thumbnail, and text metadata, comparing the accuracy of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection
