Speech Synthesis with Neural Networks

Orhan Karaali; Gerald Corrigan; Ira Gerson

arXiv:cs/9811031·cs.NE·May 23, 2007·31 cites

Speech Synthesis with Neural Networks

Orhan Karaali, Gerald Corrigan, Ira Gerson

PDF

Open Access

TL;DR

This paper presents a neural network-based system for text-to-speech synthesis that improves memory efficiency and performance over traditional concatenation and rule-based methods.

Contribution

It introduces a neural network approach using TDNNs for phonetic-to-acoustic mapping and timing control, advancing speech synthesis technology.

Findings

01

Requires less memory than concatenation systems

02

Performs well in comparative tests with commercial systems

03

Demonstrates effective neural network application in speech synthesis

Abstract

Text-to-speech conversion has traditionally been performed either by concatenating short samples of speech or by using rule-based systems to convert a phonetic representation of speech into an acoustic representation, which is then converted into speech. This paper describes a system that uses a time-delay neural network (TDNN) to perform this phonetic-to-acoustic mapping, with another neural network to control the timing of the generated speech. The neural network system requires less memory than a concatenation system, and performed well in tests comparing it to commercial systems using other technologies.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Phonetics and Phonology Research · Speech and Audio Processing