Enhanced Multimodal Hate Video Detection via Channel-wise and Modality-wise Fusion

Yinghui Zhang; Tailin Chen; Yuchen Zhang; Zeyu Fu

arXiv:2505.12051·cs.MM·May 20, 2025

Enhanced Multimodal Hate Video Detection via Channel-wise and Modality-wise Fusion

Yinghui Zhang, Tailin Chen, Yuchen Zhang, Zeyu Fu

PDF

1 Repo

TL;DR

This paper introduces CMFusion, a novel multimodal hate video detection model that effectively integrates text, audio, and video features through channel-wise and modality-wise fusion, significantly improving detection accuracy.

Contribution

The paper proposes a new fusion mechanism for multimodal hate video detection that captures temporal and modality interactions more effectively than existing methods.

Findings

01

CMFusion outperforms five baseline models in accuracy, precision, recall, and F1 score.

02

Ablation studies confirm the effectiveness of the fusion modules and temporal cross-attention.

03

The model demonstrates robustness across different parameter settings.

Abstract

The rapid rise of video content on platforms such as TikTok and YouTube has transformed information dissemination, but it has also facilitated the spread of harmful content, particularly hate videos. Despite significant efforts to combat hate speech, detecting these videos remains challenging due to their often implicit nature. Current detection methods primarily rely on unimodal approaches, which inadequately capture the complementary features across different modalities. While multimodal techniques offer a broader perspective, many fail to effectively integrate temporal dynamics and modality-wise interactions essential for identifying nuanced hate content. In this paper, we present CMFusion, an enhanced multimodal hate video detection model utilizing a novel Channel-wise and Modality-wise Fusion Mechanism. CMFusion first extracts features from text, audio, and video modalities using…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

evelynz10/cmfusion
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.