MemeMQA: Multimodal Question Answering for Memes via Rationale-Based Inferencing
Siddhant Agarwal, Shivam Sharma, Preslav Nakov, Tanmoy Chakraborty

TL;DR
MemeMQA introduces a multimodal question-answering framework for memes that combines reasoning and explanations, supported by a new dataset and outperforming baselines in accuracy and semantic alignment.
Contribution
This paper presents MemeMQA, a novel multimodal framework with a new dataset and reasoning capabilities, advancing meme understanding and interpretability.
Findings
MemeMQA achieves ~18% higher answer accuracy than baselines.
It provides coherent explanations alongside answers.
The framework demonstrates robustness across diverse question sets.
Abstract
Memes have evolved as a prevalent medium for diverse communication, ranging from humour to propaganda. With the rising popularity of image-focused content, there is a growing need to explore its potential harm from different aspects. Previous studies have analyzed memes in closed settings - detecting harm, applying semantic labels, and offering natural language explanations. To extend this research, we introduce MemeMQA, a multimodal question-answering framework aiming to solicit accurate responses to structured questions while providing coherent explanations. We curate MemeMQACorpus, a new dataset featuring 1,880 questions related to 1,122 memes with corresponding answer-explanation pairs. We further propose ARSENAL, a novel two-stage multimodal framework that leverages the reasoning capabilities of LLMs to address MemeMQA. We benchmark MemeMQA using competitive baselines and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsSentiment Analysis and Opinion Mining · Misinformation and Its Impacts
