PC-MNet: Dual-Level Congruity Modeling for Multimodal Sarcasm Detection via Polarity-Modulated Attention

Maoheng Li; Ling Zhou; Xiaohua Huang; Rubing Huang; Wenming Zheng; Guoying Zhao

arXiv:2605.02447·cs.CL·May 5, 2026

PC-MNet: Dual-Level Congruity Modeling for Multimodal Sarcasm Detection via Polarity-Modulated Attention

Maoheng Li, Ling Zhou, Xiaohua Huang, Rubing Huang, Wenming Zheng, Guoying Zhao

PDF

TL;DR

This paper introduces PC-MNet, a novel dual-level congruity modeling approach for multimodal sarcasm detection that outperforms previous methods by effectively capturing subtle incongruities using a scalar routing mechanism and contrastive learning.

Contribution

The paper proposes a new dual-level congruity modeling framework with a scalar routing mechanism and prior-guided graph, achieving state-of-the-art results in multimodal sarcasm detection.

Findings

01

Achieves 3.14% higher Macro-F1 than previous best on MUStARD.

02

Effectively isolates atomic, composition, and contextual conflicts.

03

Demonstrates robustness on balanced and benchmark datasets.

Abstract

Multimodal sarcasm detection, which aims to precisely identify pragmatic incongruities between literal text and nonverbal cues, has gained substantial attention in multimodal understanding. Recent advancements have predominantly relied on na\"{\i}ve similarity-based attention mechanisms and uniform late fusion strategies.Furthermore, given that functional entanglement restricts traditional late fusions, we incorporate a scalar congruity routing mechanism and a prior-guided contextual graph. This mechanism anchors a generalized incongruity manifold through a two-stage asymmetric optimization driven by inconsistency-aware contrastive learning, selectively fusing only the most discriminative multi-granularity evidence. Extensive experiments on the \texttt{MUStARD} benchmark and its spurious-correlation-mitigated balanced datasets demonstrate that our approach achieves new state-of-the-art…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.