Detecting Hate Speech in Memes Using Multimodal Deep Learning Approaches: Prize-winning solution to Hateful Memes Challenge
Riza Velioglu, Jewgeni Rose

TL;DR
This paper presents a multimodal deep learning approach using VisualBERT and ensemble techniques to detect hate speech in memes, achieving high accuracy and AUROC in a competitive challenge.
Contribution
It introduces a novel ensemble-based method leveraging VisualBERT for multimodal hate speech detection in memes, achieving top-tier performance in a large-scale challenge.
Findings
Achieved 0.811 AUROC on the challenge test set
Achieved 0.765 accuracy on the challenge test set
Placed third out of 3,173 participants
Abstract
Memes on the Internet are often harmless and sometimes amusing. However, by using certain types of images, text, or combinations of both, the seemingly harmless meme becomes a multimodal type of hate speech -- a hateful meme. The Hateful Memes Challenge is a first-of-its-kind competition which focuses on detecting hate speech in multimodal memes and it proposes a new data set containing 10,000+ new examples of multimodal content. We utilize VisualBERT -- which meant to be the BERT of vision and language -- that was trained multimodally on images and captions and apply Ensemble Learning. Our approach achieves 0.811 AUROC with an accuracy of 0.765 on the challenge test set and placed third out of 3,173 participants in the Hateful Memes Challenge.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Humor Studies and Applications · Multimodal Machine Learning Applications
MethodsLinear Layer · VisualBERT · Softmax · WordPiece · Linear Warmup With Linear Decay · Adam · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Layer Normalization
