TL;DR
This paper introduces a new benchmark and a reasoning framework to improve multimodal hate speech detection by understanding complex intent shifts between visual and textual content.
Contribution
It presents the H-VLI benchmark for nuanced intent detection and the ARCADE framework that uses simulated debate to enhance model reasoning capabilities.
Findings
ARCADE outperforms state-of-the-art baselines on H-VLI
The approach improves detection of implicit hate speech cases
Code and data are publicly available at the provided GitHub link
Abstract
Combating hate speech on social media is critical for securing cyberspace, yet relies heavily on the efficacy of automated detection systems. As content formats evolve, hate speech is transitioning from solely plain text to complex multimodal expressions, making implicit attacks harder to spot. Current systems, however, often falter on these subtle cases, as they struggle with multimodal content where the emergent meaning transcends the aggregation of individual modalities. To bridge this gap, we move beyond binary classification to characterize semantic intent shifts where modalities interact to construct implicit hate from benign cues or neutralize toxicity through semantic inversion. Guided by this fine-grained formulation, we curate the Hate via Vision-Language Interplay (H-VLI) benchmark where the true intent hinges on the intricate interplay of modalities rather than overt visual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
