HateClipSeg: A Segment-Level Annotated Dataset for Fine-Grained Hate Video Detection
Han Wang, Zhuoran Wang, Roy Ka-Wei Lee

TL;DR
HateClipSeg introduces a large-scale, fine-grained multimodal dataset with segment-level annotations for hate speech detection in videos, enabling more precise benchmarking and model development.
Contribution
The paper presents HateClipSeg, a novel dataset with detailed annotations and benchmarks for hate speech detection in videos, addressing limitations of previous datasets.
Findings
Current models show significant performance gaps.
The dataset enables evaluation of multimodal and temporal detection methods.
High inter-annotator agreement validates annotation quality.
Abstract
Detecting hate speech in videos remains challenging due to the complexity of multimodal content and the lack of fine-grained annotations in existing datasets. We present HateClipSeg, a large-scale multimodal dataset with both video-level and segment-level annotations, comprising over 11,714 segments labeled as Normal or across five Offensive categories: Hateful, Insulting, Sexual, Violence, Self-Harm, along with explicit target victim labels. Our three-stage annotation process yields high inter-annotator agreement (Krippendorff's alpha = 0.817). We propose three tasks to benchmark performance: (1) Trimmed Hateful Video Classification, (2) Temporal Hateful Video Localization, and (3) Online Hateful Video Classification. Results highlight substantial gaps in current models, emphasizing the need for more sophisticated multimodal and temporally aware approaches. The HateClipSeg dataset are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
