Scaling and bias codes for modeling speaker-adaptive DNN-based speech   synthesis systems

Hieu-Thi Luong; Junichi Yamagishi

arXiv:1807.11632·eess.AS·October 2, 2018

Scaling and bias codes for modeling speaker-adaptive DNN-based speech synthesis systems

Hieu-Thi Luong, Junichi Yamagishi

PDF

Open Access

TL;DR

This paper introduces a unified framework using scaling and bias codes for speaker-adaptive DNN speech synthesis, combining advantages of existing methods and improving adaptation performance.

Contribution

It proposes a novel generalized approach using scaling and bias codes for speaker adaptation, unifying and enhancing existing neural network adaptation techniques.

Findings

01

Improved speaker adaptation performance over conventional input code methods.

02

Unified framework captures benefits of layer-based and input-code approaches.

03

Efficient factorized speaker-adaptive model demonstrated.

Abstract

Most neural-network based speaker-adaptive acoustic models for speech synthesis can be categorized into either layer-based or input-code approaches. Although both approaches have their own pros and cons, most existing works on speaker adaptation focus on improving one or the other. In this paper, after we first systematically overview the common principles of neural-network based speaker-adaptive models, we show that these approaches can be represented in a unified framework and can be generalized further. More specifically, we introduce the use of scaling and bias codes as generalized means for speaker-adaptive transformation. By utilizing these codes, we can create a more efficient factorized speaker-adaptive model and capture advantages of both approaches while reducing their disadvantages. The experiments show that the proposed method can improve the performance of speaker…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing