ViSP: A PPO-Driven Framework for Sarcasm Generation with Contrastive Learning
Changli Wang, Rui Wu, Fang Yin

TL;DR
This paper introduces ViSP, a novel framework combining PPO and contrastive learning to generate high-quality sarcastic texts using a new multimodal dataset, outperforming existing models and benchmarks.
Contribution
The paper presents ViSP, a new PPO-driven framework that leverages contrastive learning for sarcasm generation, and introduces M2SaG, a multimodal dataset for this task.
Findings
ViSP surpasses all baselines in sarcasm generation metrics.
Generated texts have higher Sarcasm Scores and Factual Incongruity.
The dataset and code will be publicly released.
Abstract
Human emotions are complex, with sarcasm being a subtle and distinctive form. Despite progress in sarcasm research, sarcasm generation remains underexplored, primarily due to the overreliance on textual modalities and the neglect of visual cues, as well as the mismatch between image content and sarcastic intent in existing datasets. In this paper, we introduce M2SaG, a multimodal sarcasm generation dataset with 4,970 samples, each containing an image, a sarcastic text, and a sarcasm target. To benchmark M2SaG, we propose ViSP, a generation framework that integrates Proximal Policy Optimization (PPO) and contrastive learning. PPO utilizes reward scores from DIP to steer the generation of sarcastic texts, while contrastive learning encourages the model to favor outputs with higher reward scores. These strategies improve overall generation quality and produce texts with more pronounced…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
