SAFE-MEME: Structured Reasoning Framework for Robust Hate Speech Detection in Memes
Palash Nandi, Shivam Sharma, Tanmoy Chakraborty

TL;DR
SAFE-MEME introduces a structured reasoning framework and new datasets for detecting nuanced hate speech in memes, significantly improving robustness and accuracy over existing methods.
Contribution
The paper presents SAFE-MEME, a novel multimodal reasoning framework with hierarchical categorization and new datasets for fine-grained hate speech detection in memes.
Findings
SAFE-MEME-QA improves detection accuracy by ~5-6%.
SAFE-MEME-H outperforms baselines in regular scenarios.
Fine-tuning adapters can outperform full fine-tuning in certain cases.
Abstract
Memes act as cryptic tools for sharing sensitive ideas, often requiring contextual knowledge to interpret. This makes moderating multimodal memes challenging, as existing works either lack high-quality datasets on nuanced hate categories or rely on low-quality social media visuals. Here, we curate two novel multimodal hate speech datasets, MHS and MHS-Con, that capture fine-grained hateful abstractions in regular and confounding scenarios, respectively. We benchmark these datasets against several competing baselines. Furthermore, we introduce SAFE-MEME (Structured reAsoning FramEwork), a novel multimodal Chain-of-Thought-based framework employing Q&A-style reasoning (SAFE-MEME-QA) and hierarchical categorization (SAFE-MEME-H) to enable robust hate speech detection in memes. SAFE-MEME-QA outperforms existing baselines, achieving an average improvement of approximately 5% and 4% on MHS…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining
MethodsAdapter
