AdaSpeech: Adaptive Text to Speech for Custom Voice

Mingjian Chen; Xu Tan; Bohan Li; Yanqing Liu; Tao Qin; Sheng Zhao,; Tie-Yan Liu

arXiv:2103.00993·eess.AS·March 2, 2021·79 cites

AdaSpeech: Adaptive Text to Speech for Custom Voice

Mingjian Chen, Xu Tan, Bohan Li, Yanqing Liu, Tao Qin, Sheng Zhao,, Tie-Yan Liu

PDF

Open Access 2 Repos 1 Models 1 Video

TL;DR

AdaSpeech is an adaptive TTS system that efficiently customizes high-quality voices for individual speakers using minimal data by employing novel acoustic encoding and conditional normalization techniques.

Contribution

The paper introduces AdaSpeech, a new adaptive TTS framework that effectively handles diverse acoustic conditions and reduces adaptation parameters for personalized voice synthesis.

Findings

01

Achieves superior voice adaptation quality over baselines

02

Uses only about 5K parameters per speaker for customization

03

Effective with as little as 20 sentences of speech data

Abstract

Custom voice, a specific text to speech (TTS) service in commercial speech platforms, aims to adapt a source TTS model to synthesize personal voice for a target speaker using few speech data. Custom voice presents two unique challenges for TTS adaptation: 1) to support diverse customers, the adaptation model needs to handle diverse acoustic conditions that could be very different from source speech data, and 2) to support a large number of customers, the adaptation parameters need to be small enough for each target speaker to reduce memory usage while maintaining high voice quality. In this work, we propose AdaSpeech, an adaptive TTS system for high-quality and efficient customization of new voices. We design several techniques in AdaSpeech to address the two challenges in custom voice: 1) To handle different acoustic conditions, we use two acoustic encoders to extract an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
xihan123/so-vits-svc-5.0-nine
model· ♡ 5
♡ 5

Videos

AdaSpeech: Adaptive Text to Speech for Custom Voice· slideslive

Taxonomy

TopicsSpeech Recognition and Synthesis · Music and Audio Processing · Speech and Audio Processing

Methodstravel james · Layer Normalization