SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications

Jionghao Han; Jiatong Shi; Masao Someki; Yuxun Tang; Lan Liu; Yiwen Zhao; Wenhao Feng; Shinji Watanabe

arXiv:2511.20972·cs.SD·December 25, 2025

SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications

Jionghao Han, Jiatong Shi, Masao Someki, Yuxun Tang, Lan Liu, Yiwen Zhao, Wenhao Feng, Shinji Watanabe

PDF

Open Access 4 Models 1 Video

TL;DR

SingingSDS introduces a novel spoken dialogue system that responds with singing instead of speech, enhancing engagement and emotional impact in character-based roleplay and entertainment scenarios.

Contribution

It presents a modular, open-source framework for singing-based dialogue responses, integrating ASR, LLM, and SVS technologies for customizable, high-quality musical interactions.

Findings

01

Supports diverse character personas and musical styles

02

Offers a flexible, plug-and-play web demo

03

Enables expressive, memorable interactions

Abstract

With recent advances in automatic speech recognition (ASR), large language models (LLMs), and text-to-speech (TTS) technologies, spoken dialogue systems (SDS) have become widely accessible. However, most existing SDS are limited to conventional spoken responses. We present SingingSDS, a cascaded SDS that responds through singing rather than speaking, fostering more affective, memorable, and pleasurable interactions in character-based roleplay and interactive entertainment scenarios. SingingSDS employs a modular ASR-LLM-SVS pipeline and supports a wide range of configurations across character personas, ASR and LLM backends, SVS models, melody sources, and voice profiles, tailored to different needs in terms of latency, quality, and musical style. SingingSDS is available as a plug-and-play web demo, featuring modular, open-source code that supports customization and extension. Demo:…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications· underline

Taxonomy

TopicsSpeech and dialogue systems · AI in Service Interactions · Emotion and Mood Recognition