Whispered-to-voiced Alaryngeal Speech Conversion with Generative   Adversarial Networks

Santiago Pascual; Antonio Bonafonte; Joan Serr\`a; Jose A. Gonzalez

arXiv:1808.10687·cs.SD·November 7, 2018

Whispered-to-voiced Alaryngeal Speech Conversion with Generative Adversarial Networks

Santiago Pascual, Antonio Bonafonte, Joan Serr\`a, Jose A. Gonzalez

PDF

4 Repos

TL;DR

This paper presents an end-to-end neural method using GANs to convert whispered alaryngeal speech into natural, voiced speech with realistic pitch, improving expressiveness for voice restoration in aphonia patients.

Contribution

It introduces a novel speaker-dependent GAN-based model for direct whispered-to-voiced speech conversion, bypassing traditional vocoder-based prosody prediction methods.

Findings

01

Effective re-generation of voiced speech with realistic pitch contours

02

Preliminary qualitative results show promising speech naturalness

03

Demonstrates potential for improved voice restoration in aphonia

Abstract

Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech. Apart from intelligibility, this type of speech lacks expressiveness and naturalness due to the absence of pitch (whispered speech) or artificial generation of it (monotone speech). Existing techniques to restore prosodic information typically combine a vocoder, which parameterises the speech signal, with machine learning techniques that predict prosodic information. In contrast, this paper describes an end-to-end neural approach for estimating a fully-voiced speech waveform from whispered alaryngeal speech. By adapting our previous work in speech enhancement with generative adversarial networks, we develop a speaker-dependent model to perform whispered-to-voiced speech conversion. Preliminary qualitative results show effectiveness in re-generating voiced speech, with the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.