TL;DR
This paper presents an end-to-end neural method using GANs to convert whispered alaryngeal speech into natural, voiced speech with realistic pitch, improving expressiveness for voice restoration in aphonia patients.
Contribution
It introduces a novel speaker-dependent GAN-based model for direct whispered-to-voiced speech conversion, bypassing traditional vocoder-based prosody prediction methods.
Findings
Effective re-generation of voiced speech with realistic pitch contours
Preliminary qualitative results show promising speech naturalness
Demonstrates potential for improved voice restoration in aphonia
Abstract
Most methods of voice restoration for patients suffering from aphonia either produce whispered or monotone speech. Apart from intelligibility, this type of speech lacks expressiveness and naturalness due to the absence of pitch (whispered speech) or artificial generation of it (monotone speech). Existing techniques to restore prosodic information typically combine a vocoder, which parameterises the speech signal, with machine learning techniques that predict prosodic information. In contrast, this paper describes an end-to-end neural approach for estimating a fully-voiced speech waveform from whispered alaryngeal speech. By adapting our previous work in speech enhancement with generative adversarial networks, we develop a speaker-dependent model to perform whispered-to-voiced speech conversion. Preliminary qualitative results show effectiveness in re-generating voiced speech, with the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
