InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection

Junjie Chen; Hang Yu; Subin Huang; Sanmin Liu; Linfeng Zhang

arXiv:2406.16464·cs.CL·November 14, 2025

InterCLIP-MEP: Interactive CLIP and Memory-Enhanced Predictor for Multi-modal Sarcasm Detection

Junjie Chen, Hang Yu, Subin Huang, Sanmin Liu, Linfeng Zhang

PDF

1 Repo

TL;DR

InterCLIP-MEP is a novel multi-modal sarcasm detection framework that combines efficient cross-modal representations with a memory-augmented predictor, achieving state-of-the-art results with fewer trainable parameters and improved robustness.

Contribution

The paper introduces InterCLIP-MEP, integrating Interactive CLIP for enriched cross-modal features and a Memory-Enhanced Predictor for improved sarcasm detection, with significantly fewer trainable parameters.

Findings

01

Achieves state-of-the-art performance on MMSD, MMSD2.0, and DocMSU datasets.

02

Improves accuracy by 1.08% and F1 score by 1.51% on MMSD2.0.

03

Outperforms previous methods under distributional shift, with nearly 10% higher accuracy.

Abstract

Sarcasm in social media, frequently conveyed through the interplay of text and images, presents significant challenges for sentiment analysis and intention mining. Existing multi-modal sarcasm detection approaches have been shown to excessively depend on superficial cues within the textual modality, exhibiting limited capability to accurately discern sarcasm through subtle text-image interactions. To address this limitation, a novel framework, InterCLIP-MEP, is proposed. This framework integrates Interactive CLIP (InterCLIP), which employs an efficient training strategy to derive enriched cross-modal representations by embedding inter-modal information directly into each encoder, while using approximately 20.6 $\times$ fewer trainable parameters compared with existing state-of-the-art (SOTA) methods. Furthermore, a Memory-Enhanced Predictor (MEP) is introduced, featuring a dynamic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

CoderChen01/InterCLIP-MEP
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training