PrismAgent: Illuminating Harm in Memes via a Zero-Shot Interpretable Multi-Agent Framework
Zihan Ding, Ziyuan Yang, Yi Zhang

TL;DR
PrismAgent is a zero-shot, interpretable multi-agent framework that detects harmful memes by simulating a criminal investigation process, significantly outperforming existing methods without relying on annotated training data.
Contribution
It introduces a novel multi-agent, interpretability-focused approach for harmful meme detection that operates effectively in zero-shot settings, reducing reliance on annotated datasets.
Findings
Outperforms existing zero-shot detection methods on three datasets.
Provides explicit, interpretable reasoning steps for each detection.
Demonstrates effectiveness in identifying harmful content without training data.
Abstract
The rapid spread of memes makes harmful content detection increasingly crucial, as effective identification can curb the circulation of misinformation. However, existing methods rely heavily on high-volume annotated data, which leads to substantial training costs and limited generalization. To address these challenges, we propose PrismAgent, a zero-shot, multi-agent, interpretable framework. PrismAgent conceptualizes this task as a criminal case investigation, employing four specialized agents responsible for the analysis, investigation, prosecution, and judgment stages within a structured collaborative workflow. In the first stage, the analyst agent paraphrases each meme under benevolent and malicious assumptions to probe its underlying intent. The investigator agent then retrieves supporting evidence from an unannotated dataset and constructs contextual interpretations for the meme…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
