Towards Explainable Harmful Meme Detection through Multimodal Debate   between Large Language Models

Hongzhan Lin; Ziyang Luo; Wei Gao; Jing Ma; Bo Wang; Ruichao Yang

arXiv:2401.13298·cs.CL·January 25, 2024·1 cites

Towards Explainable Harmful Meme Detection through Multimodal Debate between Large Language Models

Hongzhan Lin, Ziyang Luo, Wei Gao, Jing Ma, Bo Wang, Ruichao Yang

PDF

Open Access 1 Repo

TL;DR

This paper introduces an explainable multimodal meme detection method that uses large language models to generate conflicting rationales and fine-tunes a judge model for harm inference, improving detection accuracy and interpretability.

Contribution

The paper presents a novel multimodal debate-based approach leveraging LLMs for generating explanations and a fine-tuned judge model for harm detection in memes.

Findings

01

Outperforms state-of-the-art methods on three datasets

02

Provides interpretable explanations for harmfulness decisions

03

Demonstrates effective multimodal reasoning over memes

Abstract

The age of social media is flooded with Internet memes, necessitating a clear grasp and effective identification of harmful ones. This task presents a significant challenge due to the implicit meaning embedded in memes, which is not explicitly conveyed through the surface text and image. However, existing harmful meme detection methods do not present readable explanations that unveil such implicit meaning to support their detection decisions. In this paper, we propose an explainable approach to detect harmful memes, achieved through reasoning over conflicting rationales from both harmless and harmful positions. Specifically, inspired by the powerful capacity of Large Language Models (LLMs) on text generation and reasoning, we first elicit multimodal debate between LLMs to generate the explanations derived from the contradictory arguments. Then we propose to fine-tune a small language…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hkbunlp/explainhm-www2024
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Hate Speech and Cyberbullying Detection · Sentiment Analysis and Opinion Mining