ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos

Mohammad Zia Ur Rehman; Anukriti Bhatnagar; Omkar Kabde; Shubhi Bansal; Nagendra Kumar

arXiv:2508.06570·cs.CV·August 18, 2025

ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos

Mohammad Zia Ur Rehman, Anukriti Bhatnagar, Omkar Kabde, Shubhi Bansal, Nagendra Kumar

PDF

1 Video

TL;DR

This paper introduces ImpliHateVid, a large-scale video dataset for implicit hate speech detection, and proposes a two-stage contrastive learning framework leveraging multimodal features to improve detection accuracy.

Contribution

The work presents the first large-scale dataset for implicit hate in videos and a novel two-stage contrastive learning approach for multimodal hate speech detection.

Findings

01

Effective detection of implicit hate speech in videos demonstrated

02

Multimodal contrastive learning improves detection accuracy

03

Dataset and method outperform existing approaches

Abstract

The existing research has primarily focused on text and image-based hate speech detection, video-based approaches remain underexplored. In this work, we introduce a novel dataset, ImpliHateVid, specifically curated for implicit hate speech detection in videos. ImpliHateVid consists of 2,009 videos comprising 509 implicit hate videos, 500 explicit hate videos, and 1,000 non-hate videos, making it one of the first large-scale video datasets dedicated to implicit hate detection. We also propose a novel two-stage contrastive learning framework for hate speech detection in videos. In the first stage, we train modality-specific encoders for audio, text, and image using contrastive loss by concatenating features from the three encoders. In the second stage, we train cross-encoders using contrastive learning to refine multimodal representations. Additionally, we incorporate sentiment, emotion,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

ImpliHateVid: A Benchmark Dataset and Two-stage Contrastive Learning Framework for Implicit Hate Speech Detection in Videos· underline