MultiHateLoc: Towards Temporal Localisation of Multimodal Hate Content in Online Videos
Qiyue Sun, Tailin Chen, Yinghui Zhang, Yuchen Zhang, Jiangbei Yue, Jianbo Jiao, Zeyu Fu

TL;DR
This paper introduces MultiHateLoc, a novel weakly-supervised framework for localising multimodal hate speech in online videos, effectively capturing temporal and cross-modal dynamics to produce fine-grained, interpretable predictions.
Contribution
MultiHateLoc is the first framework to address weakly-supervised temporal localisation of multimodal hate speech, integrating modality-aware encoders, dynamic fusion, and contrastive alignment.
Findings
Achieves state-of-the-art localisation performance on HateMM and MultiHateClip datasets.
Effectively models heterogeneous temporal patterns across modalities.
Produces fine-grained, interpretable frame-level predictions.
Abstract
The rapid growth of video content on platforms such as TikTok and YouTube has intensified the spread of multimodal hate speech, where harmful cues emerge subtly and asynchronously across visual, acoustic, and textual streams. Existing research primarily focuses on video-level classification, leaving the practically crucial task of temporal localisation, identifying when hateful segments occur, largely unaddressed. This challenge is even more noticeable under weak supervision, where only video-level labels are available, and static fusion or classification-based architectures struggle to capture cross-modal and temporal dynamics. To address these challenges, we propose MultiHateLoc, the first framework designed for weakly-supervised multimodal hate localisation. MultiHateLoc incorporates (1) modality-aware temporal encoders to model heterogeneous sequential patterns, including a tailored…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Emotion and Mood Recognition · Generative Adversarial Networks and Image Synthesis
