LSTM Deep Neural Networks Postfiltering for Improving the Quality of   Synthetic Voices

Marvin Coto-Jim\'enez; John Goddard-Close

arXiv:1602.02656·cs.SD·February 9, 2016

LSTM Deep Neural Networks Postfiltering for Improving the Quality of Synthetic Voices

Marvin Coto-Jim\'enez, John Goddard-Close

PDF

Open Access

TL;DR

This paper introduces an LSTM-based postfiltering method to enhance the spectral quality of HMM-based synthetic speech, making it more similar to natural human speech.

Contribution

The paper proposes using LSTM deep neural networks as a postfilter to improve spectral characteristics in HMM-based speech synthesis systems.

Findings

01

HMM-voices quality improved with LSTM postfiltering

02

Spectral characteristics closer to natural speech achieved

03

Potential for enhanced synthetic speech naturalness

Abstract

Recent developments in speech synthesis have produced systems capable of outcome intelligible speech, but now researchers strive to create models that more accurately mimic human voices. One such development is the incorporation of multiple linguistic styles in various languages and accents. HMM-based Speech Synthesis is of great interest to many researchers, due to its ability to produce sophisticated features with small footprint. Despite such progress, its quality has not yet reached the level of the predominant unit-selection approaches that choose and concatenate recordings of real speech. Recent efforts have been made in the direction of improving these systems. In this paper we present the application of Long-Short Term Memory Deep Neural Networks as a Postfiltering step of HMM-based speech synthesis, in order to obtain closer spectral characteristics to those of natural…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing