MemeCap: A Dataset for Captioning and Interpreting Memes

EunJeong Hwang; Vered Shwartz

arXiv:2305.13703·cs.CL·May 24, 2023·1 cites

MemeCap: A Dataset for Captioning and Interpreting Memes

EunJeong Hwang, Vered Shwartz

PDF

Open Access 1 Repo

TL;DR

MemeCap introduces a new dataset for meme captioning that includes visual metaphors and background context, revealing current vision-language models' struggles with interpreting complex meme content.

Contribution

The paper presents MemeCap, a novel dataset for meme captioning that incorporates visual metaphors and contextual information, highlighting challenges for existing models.

Findings

01

State-of-the-art VL models perform worse than humans on meme interpretation.

02

The dataset enables research on visual metaphor understanding in memes.

03

Current models struggle with visual metaphors despite success in related tasks.

Abstract

Memes are a widely popular tool for web users to express their thoughts using visual metaphors. Understanding memes requires recognizing and interpreting visual metaphors with respect to the text inside or around the meme, often while employing background knowledge and reasoning abilities. We present the task of meme captioning and release a new dataset, MemeCap. Our dataset contains 6.3K memes along with the title of the post containing the meme, the meme captions, the literal image caption, and the visual metaphors. Despite the recent success of vision and language (VL) models on tasks such as image captioning and visual question answering, our extensive experiments using state-of-the-art VL models show that they still struggle with visual metaphors, and perform substantially worse than humans.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

eujhwang/meme-cap
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsLanguage, Metaphor, and Cognition · Multimodal Machine Learning Applications · Humor Studies and Applications