Integrating Feedback Loss from Bi-modal Sarcasm Detector for Sarcastic Speech Synthesis

Zhu Li; Yuqing Zhang; Xiyuan Gao; Devraj Raghuvanshi; Nagendra Kumar; Shekhar Nayak; Matt Coler

arXiv:2508.13028·cs.CL·August 19, 2025

Integrating Feedback Loss from Bi-modal Sarcasm Detector for Sarcastic Speech Synthesis

Zhu Li, Yuqing Zhang, Xiyuan Gao, Devraj Raghuvanshi, Nagendra Kumar, Shekhar Nayak, Matt Coler

PDF

Open Access

TL;DR

This paper presents a novel sarcasm-aware speech synthesis method that integrates feedback from a bi-modal sarcasm detector and employs transfer learning to improve the naturalness and expressiveness of sarcastic speech generation.

Contribution

It introduces a feedback loss from a bi-modal sarcasm detector into TTS training and employs a two-stage transfer learning approach for better sarcasm synthesis.

Findings

01

Enhanced sarcasm detection in synthesized speech

02

Improved naturalness and expressiveness of sarcastic speech

03

Effective transfer learning strategy for sarcasm-aware TTS

Abstract

Sarcastic speech synthesis, which involves generating speech that effectively conveys sarcasm, is essential for enhancing natural interactions in applications such as entertainment and human-computer interaction. However, synthesizing sarcastic speech remains a challenge due to the nuanced prosody that characterizes sarcasm, as well as the limited availability of annotated sarcastic speech data. To address these challenges, this study introduces a novel approach that integrates feedback loss from a bi-modal sarcasm detection model into the TTS training process, enhancing the model's ability to capture and convey sarcasm. In addition, by leveraging transfer learning, a speech synthesis model pre-trained on read speech undergoes a two-stage fine-tuning process. First, it is fine-tuned on a diverse dataset encompassing various speech styles, including sarcastic speech. In the second stage,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Phonetics and Phonology Research