Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame   Selection Speech Synthesis

Thomas Drugman; Alexis Moinet; Thierry Dutoit; Geoffrey Wilfart

arXiv:1912.12887·cs.SD·January 1, 2020

Using a Pitch-Synchronous Residual Codebook for Hybrid HMM/Frame Selection Speech Synthesis

Thomas Drugman, Alexis Moinet, Thierry Dutoit, Geoffrey Wilfart

PDF

Open Access

TL;DR

This paper introduces a pitch-synchronous residual codebook to enhance speech synthesis quality by constructing more realistic source signals, leading to improved naturalness in synthesized speech.

Contribution

It presents a novel method combining a residual codebook with HMM-based synthesis to improve speech naturalness over traditional techniques.

Findings

01

Subjective tests show significant quality improvement.

02

The method effectively captures residual excitation details.

03

Enhanced naturalness compared to baseline methods.

Abstract

This paper proposes a method to improve the quality delivered by statistical parametric speech synthesizers. For this, we use a codebook of pitch-synchronous residual frames, so as to construct a more realistic source signal. First a limited codebook of typical excitations is built from some training database. During the synthesis part, HMMs are used to generate filter and source coefficients. The latter coefficients contain both the pitch and a compact representation of target residual frames. The source signal is obtained by concatenating excitation frames picked up from the codebook, based on a selection criterion and taking target residual coefficients as input. Subjective results show a relevant improvement compared to the basic technique.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Speech and Audio Processing · Music and Audio Processing