Revealing Temporal Label Noise in Multimodal Hateful Video Classification

Shuonan Yang; Tailin Chen; Rahul Singh; Jiangbei Yue; Jianbo Jiao; Zeyu Fu

arXiv:2508.04900·cs.CV·August 8, 2025

Revealing Temporal Label Noise in Multimodal Hateful Video Classification

Shuonan Yang, Tailin Chen, Rahul Singh, Jiangbei Yue, Jianbo Jiao, Zeyu Fu

PDF

TL;DR

This paper investigates how coarse video-level annotations introduce label noise in multimodal hateful video detection, revealing the importance of temporal granularity for accurate classification and model robustness.

Contribution

It provides a fine-grained analysis of temporal label noise in hateful videos and demonstrates its impact on model decision boundaries and confidence, emphasizing the need for temporally aware models.

Findings

01

Temporal label noise affects model decisions and confidence.

02

Fine-grained analysis reveals semantic overlap in hateful segments.

03

Temporal context is crucial for robust hate speech detection.

Abstract

The rapid proliferation of online multimedia content has intensified the spread of hate speech, presenting critical societal and regulatory challenges. While recent work has advanced multimodal hateful video detection, most approaches rely on coarse, video-level annotations that overlook the temporal granularity of hateful content. This introduces substantial label noise, as videos annotated as hateful often contain long non-hateful segments. In this paper, we investigate the impact of such label ambiguity through a fine-grained approach. Specifically, we trim hateful videos from the HateMM and MultiHateClip English datasets using annotated timestamps to isolate explicitly hateful segments. We then conduct an exploratory analysis of these trimmed segments to examine the distribution and characteristics of both hateful and non-hateful content. This analysis highlights the degree of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.