MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video   Detection on YouTube and Bilibili

Han Wang; Tan Rui Yang; Usman Naseem; Roy Ka-Wei Lee

arXiv:2408.03468·cs.MM·August 13, 2024

MultiHateClip: A Multilingual Benchmark Dataset for Hateful Video Detection on YouTube and Bilibili

Han Wang, Tan Rui Yang, Usman Naseem, Roy Ka-Wei Lee

PDF

1 Repo

TL;DR

MultiHateClip introduces a multilingual, multimodal dataset for hateful video detection on YouTube and Bilibili, highlighting cultural differences and challenges in current models.

Contribution

The paper presents MultiHateClip, a novel multilingual dataset with detailed annotations, addressing the lack of cross-cultural and multimodal hateful video data.

Findings

01

State-of-the-art models struggle with hateful video detection.

02

Cultural and modality differences impact detection accuracy.

03

Existing models need to be more culturally and multimodally aware.

Abstract

Hate speech is a pressing issue in modern society, with significant effects both online and offline. Recent research in hate speech detection has primarily centered on text-based media, largely overlooking multimodal content such as videos. Existing studies on hateful video datasets have predominantly focused on English content within a Western context and have been limited to binary labels (hateful or non-hateful), lacking detailed contextual information. This study presents MultiHateClip1 , an novel multilingual dataset created through hate lexicons and human annotation. It aims to enhance the detection of hateful videos on platforms such as YouTube and Bilibili, including content in both English and Chinese languages. Comprising 2,000 videos annotated for hatefulness, offensiveness, and normalcy, this dataset provides a cross-cultural perspective on gender-based hate speech. Through…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

social-ai-studio/multihateclip
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.