Variation and Synthetic Speech

Corey Miller; Orhan Karaali; and Noel Massey

arXiv:cmp-lg/9711004·cmp-lg·May 23, 2007·3 cites

Variation and Synthetic Speech

Corey Miller, Orhan Karaali, and Noel Massey

PDF

Open Access

TL;DR

This paper presents a neural network-based speech synthesis system that models linguistic and speaker-specific variation to produce more natural synthetic speech.

Contribution

It introduces a neural network architecture with a postlexical module trained on phonetic data to capture variation and adapt to individual speakers.

Findings

01

The system effectively models linguistic variation.

02

It improves naturalness in synthetic speech.

03

The architecture allows speaker-specific adaptation.

Abstract

We describe the approach to linguistic variation taken by the Motorola speech synthesizer. A pan-dialectal pronunciation dictionary is described, which serves as the training data for a neural network based letter-to-sound converter. Subsequent to dictionary retrieval or letter-to-sound generation, pronunciations are submitted a neural network based postlexical module. The postlexical module has been trained on aligned dictionary pronunciations and hand-labeled narrow phonetic transcriptions. This architecture permits the learning of individual postlexical variation, and can be retrained for each speaker whose voice is being modeled for synthesis. Learning variation in this way can result in greater naturalness for the synthetic speech that is produced by the system.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Natural Language Processing Techniques · Phonetics and Phonology Research