Generative Expressive Conversational Speech Synthesis
Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li

TL;DR
This paper introduces GPT-Talker, a generative system for conversational speech synthesis that uses multimodal dialogue context and large-scale natural datasets to produce more natural and expressive speech in multi-turn conversations.
Contribution
The paper presents a novel generative CSS system leveraging GPT and a large-scale natural dialogue dataset, improving naturalness and expressiveness over existing methods.
Findings
Outperforms state-of-the-art CSS systems in naturalness and expressiveness.
Effectively integrates multimodal dialogue context into speech synthesis.
Demonstrates robustness across Chinese and English conversational data.
Abstract
Conversational Speech Synthesis (CSS) aims to express a target utterance with the proper speaking style in a user-agent conversation setting. Existing CSS methods employ effective multi-modal context modeling techniques to achieve empathy understanding and expression. However, they often need to design complex network architectures and meticulously optimize the modules within them. In addition, due to the limitations of small-scale datasets containing scripted recording styles, they often fail to simulate real natural conversational styles. To address the above issues, we propose a novel generative expressive CSS system, termed GPT-Talker.We transform the multimodal information of the multi-turn dialogue history into discrete token sequences and seamlessly integrate them to form a comprehensive user-agent dialogue context. Leveraging the power of GPT, we predict the token sequence, that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and dialogue systems
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Cosine Annealing · Softmax · Dense Connections · Dropout · Linear Layer · Attention Dropout · Residual Connection · Linear Warmup With Cosine Annealing
