MemeCLIP: Leveraging CLIP Representations for Multimodal Meme Classification
Siddhant Bikram Shah, Shuvam Shiwakoti, Maheep Chaudhary, Haohan, Wang

TL;DR
MemeCLIP leverages CLIP representations to improve multimodal meme classification across multiple linguistic aspects, introducing a new dataset and achieving superior results over existing methods.
Contribution
This paper presents MemeCLIP, a novel framework that enhances multimodal meme classification by utilizing pre-trained CLIP, along with a new dataset for LGBTQ+ Pride memes.
Findings
MemeCLIP outperforms previous models on benchmark datasets.
The new PrideMM dataset fills a gap in multimodal meme analysis.
MemeCLIP shows competitive zero-shot performance compared to GPT-4.
Abstract
The complexity of text-embedded images presents a formidable challenge in machine learning given the need for multimodal understanding of multiple aspects of expression conveyed by them. While previous research in multimodal analysis has primarily focused on singular aspects such as hate speech and its subclasses, this study expands this focus to encompass multiple aspects of linguistics: hate, targets of hate, stance, and humor. We introduce a novel dataset PrideMM comprising 5,063 text-embedded images associated with the LGBTQ+ Pride movement, thereby addressing a serious gap in existing resources. We conduct extensive experimentation on PrideMM by using unimodal and multimodal baseline methods to establish benchmarks for each task. Additionally, we propose a novel framework MemeCLIP for efficient downstream learning while preserving the knowledge of the pre-trained CLIP model. The…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMisinformation and Its Impacts · Humor Studies and Applications · Hate Speech and Cyberbullying Detection
MethodsAttention Is All You Need · Linear Layer · Position-Wise Feed-Forward Layer · Label Smoothing · Byte Pair Encoding · Absolute Position Encodings · Softmax · Layer Normalization · Dropout · Dense Connections
