HateProof: Are Hateful Meme Detection Systems really Robust?
Piush Aggarwal, Pranit Chawla, Mithun Das, Punyajoy Saha, Binny, Mathew, Torsten Zesch, Animesh Mukherjee

TL;DR
This paper investigates the vulnerabilities of hateful meme detection systems against simple adversarial attacks and proposes methods like contrastive learning and adversarial training to improve robustness.
Contribution
It presents a comprehensive vulnerability analysis of existing hateful meme detection models and introduces a combined approach to enhance their robustness against adversarial perturbations.
Findings
Detection models' performance drops by up to 10% under attack.
Ensemble of contrastive learning and adversarial training recovers much of the lost performance.
Simple human-performed perturbations significantly challenge current detection systems.
Abstract
Exploiting social media to spread hate has tremendously increased over the years. Lately, multi-modal hateful content such as memes has drawn relatively more traction than uni-modal content. Moreover, the availability of implicit content payloads makes them fairly challenging to be detected by existing hateful meme detection systems. In this paper, we present a use case study to analyze such systems' vulnerabilities against external adversarial attacks. We find that even very simple perturbations in uni-modal and multi-modal settings performed by humans with little knowledge about the model can make the existing detection models highly vulnerable. Empirically, we find a noticeable performance drop of as high as 10% in the macro-F1 score for certain attacks. As a remedy, we attempt to boost the model's robustness using contrastive learning as well as an adversarial training-based method…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsContrastive Learning
