TL;DR
This paper presents a novel cross-modal transfer learning approach that leverages meme datasets to improve hateful video detection, addressing data scarcity issues with minimal annotation effort and outperforming existing benchmarks.
Contribution
The study introduces a human-assisted reannotation pipeline and demonstrates that meme data can effectively substitute and augment video datasets for hate speech detection.
Findings
Meme data can replace video data in resource-scarce scenarios.
Augmenting video datasets with meme data improves detection performance.
The approach outperforms state-of-the-art benchmarks.
Abstract
Detecting hate speech in online content is essential to ensuring safer digital spaces. While significant progress has been made in text and meme modalities, video-based hate speech detection remains under-explored, hindered by a lack of annotated datasets and the high cost of video annotation. This gap is particularly problematic given the growing reliance on large models, which demand substantial amounts of training data. To address this challenge, we leverage meme datasets as both a substitution and an augmentation strategy for training hateful video detection models. Our approach introduces a human-assisted reannotation pipeline to align meme dataset labels with video datasets, ensuring consistency with minimal labeling effort. Using two state-of-the-art vision-language models, we demonstrate that meme data can substitute for video data in resource-scarce scenarios and augment video…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsALIGN
