HateProof: Are Hateful Meme Detection Systems really Robust?

Piush Aggarwal; Pranit Chawla; Mithun Das; Punyajoy Saha; Binny; Mathew; Torsten Zesch; Animesh Mukherjee

arXiv:2302.05703·cs.CL·February 14, 2023

HateProof: Are Hateful Meme Detection Systems really Robust?

Piush Aggarwal, Pranit Chawla, Mithun Das, Punyajoy Saha, Binny, Mathew, Torsten Zesch, Animesh Mukherjee

PDF

TL;DR

This paper investigates the vulnerabilities of hateful meme detection systems against simple adversarial attacks and proposes methods like contrastive learning and adversarial training to improve robustness.

Contribution

It presents a comprehensive vulnerability analysis of existing hateful meme detection models and introduces a combined approach to enhance their robustness against adversarial perturbations.

Findings

01

Detection models' performance drops by up to 10% under attack.

02

Ensemble of contrastive learning and adversarial training recovers much of the lost performance.

03

Simple human-performed perturbations significantly challenge current detection systems.

Abstract

Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Learning