Techniques and Challenges in Speech Synthesis

David Ferris

arXiv:1709.07552·cs.SD·September 25, 2017

Techniques and Challenges in Speech Synthesis

David Ferris

PDF

Open Access

TL;DR

This paper presents a comprehensive approach to English speech synthesis using diphone technology, including methods for database creation, pronunciation prediction, and voice modulation, with evaluations on naturalness and intelligibility.

Contribution

It introduces a novel diphone-based speech synthesis system with automatic diphone extraction, a combined pitch and duration modification method, and a text processing pipeline for improved naturalness.

Findings

01

Diphone database creation in under 40 minutes

02

Enhanced voice naturalness through pitch and duration modulation

03

System tested for intelligibility and naturalness

Abstract

The aim of this project was to develop and implement an English language Text-to-Speech synthesis system. This involved a study of mechanisms of human speech production, a review of techniques in speech synthesis, and analysis of tests used to evaluate the effectiveness of synthesized speech. It was determined that a diphone synthesis system was the most effective choice for the scope of this project. A method of automatically identifying and extracting diphones from prompted speech was designed, allowing for the creation of a diphone database by a speaker in less than 40 minutes. CMUdict was used to determine the pronunciation of known words. A system for smoothing the transitions between diphone recordings was designed and implemented. CMUdict was then used to train a maximum-likelihood prediction system to determine the correct pronunciation of unknown English language alphabetic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis · Speech and dialogue systems