A Review of Vision-Language Models and their Performance on the Hateful   Memes Challenge

Bryan Zhao; Andrew Zhang; Blake Watson; Gillian Kearney; Isaac Dale

arXiv:2305.06159·cs.CL·May 11, 2023·2 cites

A Review of Vision-Language Models and their Performance on the Hateful Memes Challenge

Bryan Zhao, Andrew Zhang, Blake Watson, Gillian Kearney, Isaac Dale

PDF

Open Access 1 Repo

TL;DR

This paper evaluates various multimodal models for detecting hateful memes, finding that early fusion models, especially CLIP, outperform late fusion approaches in the Hateful Memes Challenge.

Contribution

It provides a comparative analysis of early and late fusion multimodal models for hate speech detection in memes, highlighting the superior performance of early fusion methods.

Findings

01

Early fusion models outperform late fusion models.

02

CLIP achieved the highest AUROC of 70.06.

03

Early fusion models are more effective for multimodal hate detection.

Abstract

Moderation of social media content is currently a highly manual task, yet there is too much content posted daily to do so effectively. With the advent of a number of multimodal models, there is the potential to reduce the amount of manual labor for this task. In this work, we aim to explore different models and determine what is most effective for the Hateful Memes Challenge, a challenge by Meta designed to further machine learning research in content moderation. Specifically, we explore the differences between early fusion and late fusion models in classifying multimodal memes containing text and images. We first implement a baseline using unimodal models for text and images separately using BERT and ResNet-152, respectively. The outputs from these unimodal models were then concatenated together to create a late fusion model. In terms of early fusion models, we implement ConcatBERT,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bzhao18/cs-7643-project
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts

MethodsAttention Is All You Need · Linear Warmup With Linear Decay · Softmax · Layer Normalization · Linear Layer · WordPiece · Dropout · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention