Can Thinking Models Think to Detect Hateful Memes?

Mohamed Bayan Kmainasi; Mucahid Kutlu; Ali Ezzat Shahroor; Abul Hasnat; Firoj Alam

arXiv:2603.01225·cs.CL·March 3, 2026

Can Thinking Models Think to Detect Hateful Memes?

Mohamed Bayan Kmainasi, Mucahid Kutlu, Ali Ezzat Shahroor, Abul Hasnat, Firoj Alam

PDF

Open Access

TL;DR

This paper explores the use of thinking-based multimodal large language models with reinforcement learning to improve the detection of hateful memes through enhanced reasoning and explanation quality.

Contribution

It introduces a novel GRPO-based training framework, extends datasets with rationales, and achieves state-of-the-art results in hateful meme detection.

Findings

01

Improved accuracy and F1 scores on Hateful Memes benchmark

02

Enhanced explanation quality with step-by-step reasoning

03

Effective reinforcement learning approach for multimodal reasoning

Abstract

Hateful memes often require compositional multimodal reasoning: the image and text may appear benign in isolation, yet their interaction conveys harmful intent. Although thinking-based multimodal large language models (MLLMs) have recently advanced vision-language understanding, their capabilities remain underexplored for hateful meme analysis. We propose a reinforcement learning based post-training framework that improves reasoning in thinking-based MLLMs through task-specific rewards and a novel Group Relative Policy Optimization (GRPO) objective. Specifically, we (i) conduct a systematic empirical study of off-the-shelf MLLMs for hateful meme understanding, (ii) extend an existing hateful meme dataset by generating weakly or pseudo-supervised chain-of-thought rationales via distillation, and (iii) introduce a GRPO-based objective that jointly optimizes meme classification and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Multimodal Machine Learning Applications