AS-Speech: Adaptive Style For Speech Synthesis

Zhipeng Li; Xiaofen Xing; Jun Wang; Shuaiqi Chen; Guoqiao Yu; Guanglu; Wan; Xiangmin Xu

arXiv:2409.05730·eess.AS·September 10, 2024·SLT

AS-Speech: Adaptive Style For Speech Synthesis

Zhipeng Li, Xiaofen Xing, Jun Wang, Shuaiqi Chen, Guoqiao Yu, Guanglu, Wan, Xiangmin Xu

PDF

Open Access

TL;DR

AS-Speech introduces a unified adaptive style framework for TTS that combines fine-grained timbre and rhythm features, resulting in more natural and speaker-similar synthesized speech.

Contribution

The paper presents a novel adaptive style method integrating timbre and rhythm into a single model for improved speech synthesis.

Findings

01

Produces more natural speech with higher fidelity.

02

Achieves better speaker similarity in style.

03

Outperforms existing adaptive TTS models.

Abstract

In recent years, there has been significant progress in Text-to-Speech (TTS) synthesis technology, enabling the high-quality synthesis of voices in common scenarios. In unseen situations, adaptive TTS requires a strong generalization capability to speaker style characteristics. However, the existing adaptive methods can only extract and integrate coarse-grained timbre or mixed rhythm attributes separately. In this paper, we propose AS-Speech, an adaptive style methodology that integrates the speaker timbre characteristics and rhythmic attributes into a unified framework for text-to-speech synthesis. Specifically, AS-Speech can accurately simulate style characteristics through fine-grained text-based timbre features and global rhythm information, and achieve high-fidelity speech synthesis through the diffusion model. Experiments show that the proposed model produces voices with higher…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and dialogue systems · Natural Language Processing Techniques

MethodsDiffusion