Decoding Memes: Benchmarking Narrative Role Classification across Multilingual and Multimodal Models
Shivam Sharma, Tanmoy Chakraborty

TL;DR
This paper benchmarks various multilingual and multimodal models for classifying narrative roles in internet memes, highlighting challenges in cultural and linguistic diversity, and emphasizing the importance of prompt design and multimodal reasoning.
Contribution
It introduces a balanced, multilingual meme dataset and evaluates diverse models, revealing insights into their strengths and limitations in narrative role classification.
Findings
Larger models like DeBERTa-v3 and Qwen2.5-VL perform better but still struggle with 'Victim' role.
Multimodal and prompt engineering strategies yield marginal improvements.
Cultural and language diversity pose significant challenges for role classification.
Abstract
This work investigates the challenging task of identifying narrative roles - Hero, Villain, Victim, and Other - in Internet memes, across three diverse test sets spanning English and code-mixed (English-Hindi) languages. Building on an annotated dataset originally skewed toward the 'Other' class, we explore a more balanced and linguistically diverse extension, originally introduced as part of the CLEF 2024 shared task. Comprehensive lexical and structural analyses highlight the nuanced, culture-specific, and context-rich language used in real memes, in contrast to synthetically curated hateful content, which exhibits explicit and repetitive lexical markers. To benchmark the role detection task, we evaluate a wide spectrum of models, including fine-tuned multilingual transformers, sentiment and abuse-aware classifiers, instruction-tuned LLMs, and multimodal vision-language models.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
