Voice Conversion for Whispered Speech Synthesis

Marius Cotescu; Thomas Drugman; Goeric Huybrechts; Jaime; Lorenzo-Trueba; Alexis Moinet

arXiv:1912.05289·cs.SD·January 22, 2020

Voice Conversion for Whispered Speech Synthesis

Marius Cotescu, Thomas Drugman, Goeric Huybrechts, Jaime, Lorenzo-Trueba, Alexis Moinet

PDF

TL;DR

This paper introduces a voice conversion approach for whisper synthesis using GMM and DNN models, outperforming rule-based methods and achieving naturalness comparable to real whispers, with successful generalization to unseen speakers.

Contribution

It demonstrates the effectiveness of DNN-based voice conversion for whisper synthesis and its application in Amazon Alexa's Whisper Mode, surpassing traditional signal processing techniques.

Findings

01

VC techniques outperform rule-based methods

02

Converted whispers are indistinguishable from natural whispers

03

DNN generalizes well to unseen speakers

Abstract

We present an approach to synthesize whisper by applying a handcrafted signal processing recipe and Voice Conversion (VC) techniques to convert normally phonated speech to whispered speech. We investigate using Gaussian Mixture Models (GMM) and Deep Neural Networks (DNN) to model the mapping between acoustic features of normal speech and those of whispered speech. We evaluate naturalness and speaker similarity of the converted whisper on an internal corpus and on the publicly available wTIMIT corpus. We show that applying VC techniques is significantly better than using rule-based signal processing methods and it achieves results that are indistinguishable from copy-synthesis of natural whisper recordings. We investigate the ability of the DNN model to generalize on unseen speakers, when trained with data from multiple speakers. We show that excluding the target speaker from the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.