A High Quality Text-To-Speech System Composed of Multiple Neural   Networks

Orhan Karaali; Gerald Corrigan; Noel Massey; Corey Miller; Otto; Schnurr; Andrew Mackie

arXiv:cs/9812006·cs.NE·November 17, 2016

A High Quality Text-To-Speech System Composed of Multiple Neural Networks

Orhan Karaali, Gerald Corrigan, Noel Massey, Corey Miller, Otto, Schnurr, Andrew Mackie

PDF

TL;DR

This paper presents a comprehensive neural network-based text-to-speech system that integrates linguistic, acoustic, and visual modules for highly adaptable and natural speech and animation synthesis.

Contribution

It introduces a fully neural network-based TTS system with separate modules for linguistic, acoustic, and visual processing, enhancing adaptability and naturalness.

Findings

01

Achieved high-quality speech synthesis with neural networks.

02

Enabled multilingual and multi-voice adaptability.

03

Integrated visual animation for talking head synchronization.

Abstract

While neural networks have been employed to handle several different text-to-speech tasks, ours is the first system to use neural networks throughout, for both linguistic and acoustic processing. We divide the text-to-speech task into three subtasks, a linguistic module mapping from text to a linguistic representation, an acoustic module mapping from the linguistic representation to speech, and a video module mapping from the linguistic representation to animated images. The linguistic module employs a letter-to-sound neural network and a postlexical neural network. The acoustic module employs a duration neural network and a phonetic neural network. The visual neural network is employed in parallel to the acoustic module to drive a talking head. The use of neural networks that can be retrained on the characteristics of different voices and languages affords our system a degree of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.