Reasoning-Aware Multimodal Fusion for Hateful Video Detection
Shuonan Yang, Tailin Chen, Jiangbei Yue, Guangliang Cheng, Jianbo Jiao, Zeyu Fu

TL;DR
This paper introduces RAMF, a novel reasoning-aware multimodal fusion framework that enhances hateful video detection by capturing semantic interactions and reasoning about nuanced hate content, significantly outperforming existing methods.
Contribution
The paper proposes a new RAMF framework with LGCF and SCA modules, and a structured adversarial reasoning process to improve multimodal understanding of hate speech in videos.
Findings
Achieves 3% and 7% improvements in Macro-F1 and hate class recall.
Effectively captures local and global contextual cues.
Enables fine-grained semantic interaction across modalities.
Abstract
Hate speech in online videos is posing an increasingly serious threat to digital platforms, especially as video content becomes increasingly multimodal and context-dependent. Existing methods often struggle to effectively fuse the complex semantic relationships between modalities and lack the ability to understand nuanced hateful content. To address these issues, we propose an innovative Reasoning-Aware Multimodal Fusion (RAMF) framework. To tackle the first challenge, we design Local-Global Context Fusion (LGCF) to capture both local salient cues and global temporal structures, and propose Semantic Cross Attention (SCA) to enable fine-grained multimodal semantic interaction. To tackle the second challenge, we introduce adversarial reasoning-a structured three-stage process where a vision-language model generates (i) objective descriptions, (ii) hate-assumed inferences, and (iii)…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Generative Adversarial Networks and Image Synthesis · Adversarial Robustness in Machine Learning
