The Hateful Memes Challenge: Detecting Hate Speech in Multimodal Memes
Douwe Kiela, Hamed Firooz, Aravind Mohan, Vedanuj Goswami, Amanpreet, Singh, Pratik Ringshia, Davide Testuggine

TL;DR
This paper introduces a challenging multimodal meme dataset for hate speech detection, emphasizing the difficulty of the task and the gap between current models and human performance.
Contribution
It presents a new benchmark dataset with challenging examples to evaluate multimodal hate speech detection models, highlighting the need for more sophisticated approaches.
Findings
State-of-the-art models perform significantly worse than humans
Unimodal models struggle with the dataset's challenging examples
The dataset reveals the difficulty of detecting hate speech in memes
Abstract
This work proposes a new challenge set for multimodal classification, focusing on detecting hate speech in multimodal memes. It is constructed such that unimodal models struggle and only multimodal models can succeed: difficult examples ("benign confounders") are added to the dataset to make it hard to rely on unimodal signals. The task requires subtle reasoning, yet is straightforward to evaluate as a binary classification problem. We provide baseline performance numbers for unimodal models, as well as for multimodal models with various degrees of sophistication. We find that state-of-the-art methods perform poorly compared to humans (64.73% vs. 84.7% accuracy), illustrating the difficulty of the task and highlighting the challenge that this important problem poses to the community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Humor Studies and Applications · Sentiment Analysis and Opinion Mining
