Can Thinking Models Think to Detect Hateful Memes?
Mohamed Bayan Kmainasi, Mucahid Kutlu, Ali Ezzat Shahroor, Abul Hasnat, Firoj Alam

TL;DR
This paper explores the use of thinking-based multimodal large language models with reinforcement learning to improve the detection of hateful memes through enhanced reasoning and explanation quality.
Contribution
It introduces a novel GRPO-based training framework, extends datasets with rationales, and achieves state-of-the-art results in hateful meme detection.
Findings
Improved accuracy and F1 scores on Hateful Memes benchmark
Enhanced explanation quality with step-by-step reasoning
Effective reinforcement learning approach for multimodal reasoning
Abstract
Hateful memes often require compositional multimodal reasoning: the image and text may appear benign in isolation, yet their interaction conveys harmful intent. Although thinking-based multimodal large language models (MLLMs) have recently advanced vision-language understanding, their capabilities remain underexplored for hateful meme analysis. We propose a reinforcement learning based post-training framework that improves reasoning in thinking-based MLLMs through task-specific rewards and a novel Group Relative Policy Optimization (GRPO) objective. Specifically, we (i) conduct a systematic empirical study of off-the-shelf MLLMs for hateful meme understanding, (ii) extend an existing hateful meme dataset by generating weakly or pseudo-supervised chain-of-thought rationales via distillation, and (iii) introduce a GRPO-based objective that jointly optimizes meme classification and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHate Speech and Cyberbullying Detection · Misinformation and Its Impacts · Multimodal Machine Learning Applications
