A CLIP-based siamese approach for meme classification
Javier Huertas-Tato, Christos Koutlis, Symeon Papadopoulos, David, Camacho, and Ioannis Kompatsiaris

TL;DR
This paper introduces SimCLIP, a deep learning model using CLIP and Siamese networks for effective cross-modal meme classification, achieving state-of-the-art results and enabling scalable meme moderation.
Contribution
The paper presents a novel cross-modal meme classification architecture leveraging pre-trained CLIP and Siamese fusion, setting new performance benchmarks across multiple datasets.
Findings
State-of-the-art F1-score improvement on Memotion7k
Super-human performance on Harm-P dataset
Efficient and accurate meme classification model
Abstract
Memes are an increasingly prevalent element of online discourse in social networks, especially among young audiences. They carry ideas and messages that range from humorous to hateful, and are widely consumed. Their potentially high impact requires adequate means of control to moderate their use in large scale. In this work, we propose SimCLIP a deep learning-based architecture for cross-modal understanding of memes, leveraging a pre-trained CLIP encoder to produce context-aware embeddings and a Siamese fusion technique to capture the interactions between text and image. We perform an extensive experimentation on seven meme classification tasks across six datasets. We establish a new state of the art in Memotion7k with a 7.25% relative F1-score improvement, and achieve super-human performance on Harm-P with 13.73% F1-Score improvement. Our approach demonstrates the potential for compact…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiochemical Analysis and Sensing Techniques · Influenza Virus Research Studies
