Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via   Cross-Modal Triggers

Ruofei Wang; Hongzhan Lin; Ziyuan Luo; Ka Chun Cheung; Simon See; Jing; Ma; Renjie Wan

arXiv:2412.15503·cs.CR·December 23, 2024

Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers

Ruofei Wang, Hongzhan Lin, Ziyuan Luo, Ka Chun Cheung, Simon See, Jing, Ma, Renjie Wan

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper reveals a new threat to hateful meme detection systems through backdoor attacks using cross-modal triggers, demonstrating an effective and stealthy attack framework called Meme Trojan.

Contribution

It introduces Meme Trojan, a novel backdoor attack method utilizing cross-modal triggers and adaptive injection techniques for hateful meme detectors.

Findings

01

Meme Trojan outperforms existing backdoor attack methods in effectiveness.

02

The proposed triggers are highly stealthy and seamlessly integrated.

03

The framework demonstrates significant success under automatic application scenarios.

Abstract

Hateful meme detection aims to prevent the proliferation of hateful memes on various social media platforms. Considering its impact on social environments, this paper introduces a previously ignored but significant threat to hateful meme detection: backdoor attacks. By injecting specific triggers into meme samples, backdoor attackers can manipulate the detector to output their desired outcomes. To explore this, we propose the Meme Trojan framework to initiate backdoor attacks on hateful meme detection. Meme Trojan involves creating a novel Cross-Modal Trigger (CMT) and a learnable trigger augmentor to enhance the trigger pattern according to each input sample. Due to the cross-modal property, the proposed CMT can effectively initiate backdoor attacks on hateful meme detectors under an automatic application scenario. Additionally, the injection position and size of our triggers are…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rfww/cmtmeme
pytorchOfficial

Videos

Meme Trojan: Backdoor Attacks Against Hateful Meme Detection via Cross-Modal Triggers· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts