Eigenresiduals for improved Parametric Speech Synthesis
Thomas Drugman, Geoffrey Wilfart, Thierry Dutoit

TL;DR
This paper introduces a PCA-based eigenresidual excitation model for parametric speech synthesis that reduces buzziness and improves sound quality while maintaining a small synthesis engine footprint.
Contribution
It proposes a novel PCA-based excitation model for speech synthesis that enhances naturalness and reduces buzziness compared to traditional methods.
Findings
Improved speech quality with eigenresiduals over traditional excitation.
Maintained small synthesis engine footprint (~1MB).
Effective reduction of buzziness in synthesized speech.
Abstract
Statistical parametric speech synthesizers have recently shown their ability to produce natural-sounding and flexible voices. Unfortunately the delivered quality suffers from a typical buzziness due to the fact that speech is vocoded. This paper proposes a new excitation model in order to reduce this undesirable effect. This model is based on the decomposition of pitch-synchronous residual frames on an orthonormal basis obtained by Principal Component Analysis. This basis contains a limited number of eigenresiduals and is computed on a relatively small speech database. A stream of PCA-based coefficients is added to our HMM-based synthesizer and allows to generate the voiced excitation during the synthesis. An improvement compared to the traditional excitation is reported while the synthesis engine footprint remains under about 1Mb.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Speech Recognition and Synthesis · Music and Audio Processing
