PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework

Zihan Ding; Ziyuan Yang; Yi Zhang

arXiv:2605.02940·cs.LG·May 6, 2026

PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework

Zihan Ding, Ziyuan Yang, Yi Zhang

PDF

TL;DR

PrismAgent is a zero-shot, interpretable multi-agent framework that detects harmful memes by simulating a criminal investigation process, significantly outperforming existing methods without relying on annotated training data.

Contribution

It introduces a novel multi-agent, interpretability-focused approach for harmful meme detection that operates effectively in zero-shot settings, reducing reliance on annotated datasets.

Findings

01

Outperforms existing zero-shot detection methods on three datasets.

02

Provides explicit, interpretable reasoning steps for each detection.

03

Demonstrates effectiveness in identifying harmful content without training data.

Abstract

The rapid spread of memes makes harmful content detection increasingly crucial, as effective identification can curb the circulation of misinformation. However, existing methods rely heavily on high-volume annotated data, which leads to substantial training costs and limited generalization. To address these challenges, we propose PrismAgent, a zero-shot, multi-agent, interpretable framework. PrismAgent conceptualizes this task as a criminal case investigation, employing four specialized agents responsible for the analysis, investigation, prosecution, and judgment stages within a structured collaborative workflow. In the first stage, the analyst agent paraphrases each meme under benevolent and malicious assumptions to probe its underlying intent. The investigator agent then retrieves supporting evidence from an unannotated dataset and constructs contextual interpretations for the meme…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.