Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme   Detection

Rui Cao; Ming Shan Hee; Adriel Kuek; Wen-Haw Chong; Roy Ka-Wei Lee,; Jing Jiang

arXiv:2308.08088·cs.CV·August 17, 2023·1 cites

Pro-Cap: Leveraging a Frozen Vision-Language Model for Hateful Meme Detection

Rui Cao, Ming Shan Hee, Adriel Kuek, Wen-Haw Chong, Roy Ka-Wei Lee,, Jing Jiang

PDF

Open Access 2 Repos

TL;DR

Pro-Cap introduces a novel zero-shot approach for hateful meme detection by prompting a frozen vision-language model with hate-related questions, generating informative captions that improve detection accuracy across benchmarks.

Contribution

The paper presents a probing-based captioning method that leverages frozen PVLMs for hateful meme detection without fine-tuning, enhancing efficiency and effectiveness.

Findings

01

Pro-Cap achieves strong performance on three benchmarks.

02

The method effectively captures hateful content information.

03

It demonstrates good generalization across datasets.

Abstract

Hateful meme detection is a challenging multimodal task that requires comprehension of both vision and language, as well as cross-modal interactions. Recent studies have tried to fine-tune pre-trained vision-language models (PVLMs) for this task. However, with increasing model sizes, it becomes important to leverage powerful PVLMs more efficiently, rather than simply fine-tuning them. Recently, researchers have attempted to convert meme images into textual captions and prompt language models for predictions. This approach has shown good performance but suffers from non-informative image captions. Considering the two factors mentioned above, we propose a probing-based captioning approach to leverage PVLMs in a zero-shot visual question answering (VQA) manner. Specifically, we prompt a frozen PVLM by asking hateful content-related questions and use the answers as image captions (which we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Multimodal Machine Learning Applications · Misinformation and Its Impacts