A GAN and LLM-Driven Data Augmentation Framework for Dynamic Linguistic Pattern Modeling in Chinese Sarcasm Detection
Wenxian Wang, Xiaohu Luo, Junfeng Hao, Xiaoming Gu, Xingshu Chen, Zhu Wang, Haizhou Wang

TL;DR
This paper introduces a GAN and LLM-based data augmentation framework that enhances Chinese sarcasm detection by modeling user-specific linguistic patterns and expanding datasets.
Contribution
It presents a novel approach combining GANs and GPT-3.5 for data augmentation and extends BERT to incorporate user behavior for improved sarcasm detection.
Findings
Achieved F1-scores of 0.9138 (non-sarcastic) and 0.9151 (sarcastic), outperforming existing methods.
Created SinaSarc, a large dataset with comments, context, and user behavior information.
Demonstrated the effectiveness of dynamic user pattern modeling in sarcasm detection.
Abstract
Sarcasm is a rhetorical device that expresses criticism or emphasizes characteristics of certain individuals or situations through exaggeration, irony, or comparison. Existing methods for Chinese sarcasm detection are constrained by limited datasets and high construction costs, and they mainly focus on textual features, overlooking user-specific linguistic patterns that shape how opinions and emotions are expressed. This paper proposes a Generative Adversarial Network (GAN) and Large Language Model (LLM)-driven data augmentation framework to dynamically model users' linguistic patterns for enhanced Chinese sarcasm detection. First, we collect raw data from various topics on Sina Weibo. Then, we train a GAN on these data and apply a GPT-3.5 based data augmentation technique to synthesize an extended sarcastic comment dataset, named SinaSarc. This dataset contains target comments,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
