ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection

Yingjia Xu; Jiulong Wu; Bowen Zhang; Baokui Guo; Siyuan Chai; Min Cao

arXiv:2605.20867·cs.MA·May 21, 2026

ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection

Yingjia Xu, Jiulong Wu, Bowen Zhang, Baokui Guo, Siyuan Chai, Min Cao

PDF

TL;DR

ProCrit introduces a self-elicited multi-perspective reasoning framework with critic-guided revision for improved multimodal sarcasm detection, addressing the diversity of sarcastic mechanisms and lack of process supervision.

Contribution

It proposes a novel Proposal-Critic two-agent framework that autonomously generates and refines multiple analytical perspectives for sarcasm detection.

Findings

01

ProCrit outperforms existing methods on three benchmark datasets.

02

Synthesizing process-level reasoning annotations enhances model training.

03

Critic-guided targeted revisions improve reasoning reliability.

Abstract

Multimodal sarcasm detection requires reasoning over cross-modal incongruities between literal expression and intended meaning, yet the specific analytical perspectives needed vary across samples due to the diversity of sarcastic mechanisms. While recent methods make this analytical process explicit, they still rely on fixed, predefined perspectives that operate independently under hand-crafted routing rules. We argue that multimodal sarcasm detection instead calls for self-elicited multi-perspective reasoning, where a model autonomously generates the perspectives needed for each sample and progressively integrates them into a coherent analysis. To realize this goal, we propose ProCrit, a Proposal-Critic two-agent framework with a proposal agent for multi-perspective reasoning and a critic agent for external evaluation and targeted revision guidance. First, to overcome the lack of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.