On VLMs for Diverse Tasks in Multimodal Meme Classification

Deepesh Gavit; Debajyoti Mazumder; Samiran Das; Jasabanta Patro

arXiv:2505.20937·cs.CL·May 28, 2025

On VLMs for Diverse Tasks in Multimodal Meme Classification

Deepesh Gavit, Debajyoti Mazumder, Samiran Das, Jasabanta Patro

PDF

Open Access

TL;DR

This paper systematically analyzes vision-language models for meme classification, introducing a novel approach that combines VLMs and LLMs to improve accuracy across various meme understanding tasks.

Contribution

It presents a new method that uses detailed meme interpretations from VLMs to train smaller LLMs, enhancing classification performance.

Findings

01

Improved baseline performance by up to 26.24% in sentiment classification.

02

Benchmarking of VLMs with diverse prompting strategies.

03

Assessment of LoRA fine-tuning across VLM components.

Abstract

In this paper, we present a comprehensive and systematic analysis of vision-language models (VLMs) for disparate meme classification tasks. We introduced a novel approach that generates a VLM-based understanding of meme images and fine-tunes the LLMs on textual understanding of the embedded meme text for improving the performance. Our contributions are threefold: (1) Benchmarking VLMs with diverse prompting strategies purposely to each sub-task; (2) Evaluating LoRA fine-tuning across all VLM components to assess performance gains; and (3) Proposing a novel approach where detailed meme interpretations generated by VLMs are used to train smaller language models (LLMs), significantly improving classification. The strategy of combining VLMs with LLMs improved the baseline performance by 8.34%, 3.52% and 26.24% for sarcasm, offensive and sentiment classification, respectively. Our results…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Sentiment Analysis and Opinion Mining · Advanced Malware Detection Techniques