More Than Sum of Its Parts: Deciphering Intent Shifts in Multimodal Hate Speech Detection

Runze Sun; Yu Zheng; Zexuan Xiong; Zhongjin Qu; Lei Chen; Jie Zhou; Jiwen Lu

arXiv:2603.21298·cs.CL·April 22, 2026

More Than Sum of Its Parts: Deciphering Intent Shifts in Multimodal Hate Speech Detection

Runze Sun, Yu Zheng, Zexuan Xiong, Zhongjin Qu, Lei Chen, Jie Zhou, Jiwen Lu

PDF

1 Repo

TL;DR

This paper introduces a new benchmark and a reasoning framework to improve multimodal hate speech detection by understanding complex intent shifts between visual and textual content.

Contribution

It presents the H-VLI benchmark for nuanced intent detection and the ARCADE framework that uses simulated debate to enhance model reasoning capabilities.

Findings

01

ARCADE outperforms state-of-the-art baselines on H-VLI

02

The approach improves detection of implicit hate speech cases

03

Code and data are publicly available at the provided GitHub link

Abstract

Combating hate speech on social media is critical for securing cyberspace, yet relies heavily on the efficacy of automated detection systems. As content formats evolve, hate speech is transitioning from solely plain text to complex multimodal expressions, making implicit attacks harder to spot. Current systems, however, often falter on these subtle cases, as they struggle with multimodal content where the emergent meaning transcends the aggregation of individual modalities. To bridge this gap, we move beyond binary classification to characterize semantic intent shifts where modalities interact to construct implicit hate from benign cues or neutralize toxicity through semantic inversion. Guided by this fine-grained formulation, we curate the Hate via Vision-Language Interplay (H-VLI) benchmark where the true intent hinges on the intricate interplay of modalities rather than overt visual…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Sayur1n/H-VLI
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.