Multimodal Learning for Hateful Memes Detection
Yi Zhou, Zhenhao Chen

TL;DR
This paper introduces a novel multimodal approach for detecting hateful memes by integrating image captioning, addressing the weak alignment between image and text, and demonstrating effectiveness on relevant datasets.
Contribution
The paper proposes a new method that incorporates image captioning into hateful meme detection, improving multimodal reasoning capabilities.
Findings
Achieved promising results on the Hateful Memes Detection Challenge.
Demonstrated the effectiveness of integrating captioning with hate detection.
Improved detection accuracy over baseline models.
Abstract
Memes are used for spreading ideas through social networks. Although most memes are created for humor, some memes become hateful under the combination of pictures and text. Automatically detecting the hateful memes can help reduce their harmful social impact. Unlike the conventional multimodal tasks, where the visual and textual information is semantically aligned, the challenge of hateful memes detection lies in its unique multimodal information. The image and text in memes are weakly aligned or even irrelevant, which requires the model to understand the content and perform reasoning over multiple modalities. In this paper, we focus on multimodal hateful memes detection and propose a novel method that incorporates the image captioning process into the memes detection process. We conduct extensive experiments on multimodal meme datasets and illustrated the effectiveness of our approach.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Humor Studies and Applications · Multimodal Machine Learning Applications
