TL;DR
This paper introduces CFMS, a fine-grained Chinese multimodal sarcasm dataset with annotations for sarcasm, target, and explanation, and proposes a reinforcement learning strategy to improve sarcasm detection models.
Contribution
It creates the first fine-grained Chinese multimodal sarcasm dataset with detailed annotations and introduces a novel RL-augmented in-context learning method for better sarcasm detection.
Findings
Fine-grained annotations improve sarcasm explanation generation.
The parallel Chinese-English metaphor subset reveals model limitations in metaphoric reasoning.
The proposed PGDS method outperforms existing baselines in key tasks.
Abstract
Multimodal sarcasm detection has recently garnered significant attention. However, existing benchmarks suffer from coarse-grained annotations and limited cultural coverage, which hinder research into fine-grained semantic understanding. To address this, we construct CFMS, the first fine-grained multimodal sarcasm dataset tailored for Chinese social media. It comprises 2,796 high-quality image-text pairs and provides a triple-level annotation framework: sarcasm identification, target recognition, and explanation generation. We find that the fine-grained explanation annotations effectively guide AI in generating images with explicit sarcastic intent. Furthermore, we curate a high-consistency parallel Chinese-English metaphor subset (200 entries each), revealing significant limitations of current models in metaphoric reasoning. To overcome the constraints of traditional retrieval methods,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
