Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis

Yifan Hu; Rui Liu; Yi Ren; Xiang Yin; Haizhou Li

arXiv:2505.12597·cs.SD·May 20, 2025

Chain-Talker: Chain Understanding and Rendering for Empathetic Conversational Speech Synthesis

Yifan Hu, Rui Liu, Yi Ren, Xiang Yin, Haizhou Li

PDF

Open Access 1 Repo

TL;DR

Chain-Talker introduces a three-stage framework for empathetic conversational speech synthesis that improves emotional expressiveness and interpretability by mimicking human cognition and utilizing an LLM-driven emotion captioning pipeline.

Contribution

It presents a novel three-stage framework for CSS that enhances emotional perception and interpretability, supported by an LLM-based emotion captioning pipeline.

Findings

01

Outperforms existing methods in producing expressive, empathetic speech

02

Uses CSS-EmCap for reliable emotion modeling

03

Demonstrates effectiveness on three benchmark datasets

Abstract

Conversational Speech Synthesis (CSS) aims to align synthesized speech with the emotional and stylistic context of user-agent interactions to achieve empathy. Current generative CSS models face interpretability limitations due to insufficient emotional perception and redundant discrete speech coding. To address the above issues, we present Chain-Talker, a three-stage framework mimicking human cognition: Emotion Understanding derives context-aware emotion descriptors from dialogue history; Semantic Understanding generates compact semantic codes via serialized prediction; and Empathetic Rendering synthesizes expressive speech by integrating both components. To support emotion modeling, we develop CSS-EmCap, an LLM-driven automated pipeline for generating precise conversational speech emotion captions. Experiments on three benchmark datasets demonstrate that Chain-Talker produces more…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ai-s2-lab/chain-talker
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEmotion and Mood Recognition · Sentiment Analysis and Opinion Mining · Topic Modeling