M3Hop-CoT: Misogynous Meme Identification with Multimodal Multi-hop Chain-of-Thought
Gitanjali Kumari, Kirtan Jain, Asif Ekbal

TL;DR
This paper introduces M3Hop-CoT, a multimodal multi-hop chain-of-thought framework that improves misogynous meme detection by integrating visual and textual cues, emotional context, and cultural knowledge, validated on multiple datasets.
Contribution
The paper proposes a novel multimodal multi-hop CoT framework combining CLIP and reasoning modules, addressing cultural and emotional cues for misogynous meme detection, outperforming existing methods.
Findings
Achieves high macro-F1 score on SemEval-2022 MAMI dataset.
Demonstrates strong generalization across various meme datasets.
Effectively incorporates emotion and context in multimodal reasoning.
Abstract
In recent years, there has been a significant rise in the phenomenon of hate against women on social media platforms, particularly through the use of misogynous memes. These memes often target women with subtle and obscure cues, making their detection a challenging task for automated systems. Recently, Large Language Models (LLMs) have shown promising results in reasoning using Chain-of-Thought (CoT) prompting to generate the intermediate reasoning chains as the rationale to facilitate multimodal tasks, but often neglect cultural diversity and key aspects like emotion and contextual knowledge hidden in the visual modalities. To address this gap, we introduce a Multimodal Multi-hop CoT (M3Hop-CoT) framework for Misogynous meme identification, combining a CLIP-based classifier and a multimodal CoT module with entity-object-relationship integration. M3Hop-CoT employs a three-step…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Gothic Literature and Media Analysis
