Leveraging Large Language Models for Sarcastic Speech Annotation in Sarcasm Detection
Zhu Li, Yuqing Zhang, Xiyuan Gao, Shekhar Nayak, Matt Coler

TL;DR
This paper introduces PodSarc, a large-scale sarcastic speech dataset created using LLMs and human verification, to improve sarcasm detection in speech with a new annotation pipeline and a collaborative gating detection model.
Contribution
It presents a novel annotation pipeline leveraging LLMs and human verification, and introduces PodSarc, a new dataset for sarcasm detection in speech.
Findings
Detection model achieves 73.63% F1 score.
LLMs effectively generate sarcasm annotations with human verification.
PodSarc dataset provides a new benchmark for sarcasm detection.
Abstract
Sarcasm fundamentally alters meaning through tone and context, yet detecting it in speech remains a challenge due to data scarcity. In addition, existing detection systems often rely on multimodal data, limiting their applicability in contexts where only speech is available. To address this, we propose an annotation pipeline that leverages large language models (LLMs) to generate a sarcasm dataset. Using a publicly available sarcasm-focused podcast, we employ GPT-4o and LLaMA 3 for initial sarcasm annotations, followed by human verification to resolve disagreements. We validate this approach by comparing annotation quality and detection performance on a publicly available sarcasm dataset using a collaborative gating architecture. Finally, we introduce PodSarc, a large-scale sarcastic speech dataset created through this pipeline. The detection model achieves a 73.63% F1 score,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
