Environment Aware Text-to-Speech Synthesis

Daxin Tan; Guangyan Zhang; Tan Lee

arXiv:2110.03887·eess.AS·August 9, 2022·1 cites

Environment Aware Text-to-Speech Synthesis

Daxin Tan, Guangyan Zhang, Tan Lee

PDF

Open Access

TL;DR

This paper introduces an environment-aware TTS system that models and incorporates acoustic environment factors to generate speech matching specific speaker and environment characteristics, leveraging heterogeneous speech data.

Contribution

It presents a novel neural network approach that disentangles speaker and environment factors in speech, enabling environment-aware speech synthesis from diverse data sources.

Findings

01

Effective disentanglement of speaker and environment factors.

02

Ability to synthesize speech with specified speaker and environment attributes.

03

Demonstrated improvements in speech quality and attribute control.

Abstract

This study aims at designing an environment-aware text-to-speech (TTS) system that can generate speech to suit specific acoustic environments. It is also motivated by the desire to leverage massive data of speech audio from heterogeneous sources in TTS system development. The key idea is to model the acoustic environment in speech audio as a factor of data variability and incorporate it as a condition in the process of neural network based speech synthesis. Two embedding extractors are trained with two purposely constructed datasets for characterization and disentanglement of speaker and environment factors in speech. A neural network model is trained to generate speech from extracted speaker and environment embeddings. Objective and subjective evaluation results demonstrate that the proposed TTS system is able to effectively disentangle speaker and environment factors and synthesize…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing