Pathological voice adaptation with autoencoder-based voice conversion
Marc Illa, Bence Mark Halpern, Rob van Son, Laureano Moro-Velazquez,, Odette Scharenborg

TL;DR
This paper introduces a novel pathological speech synthesis method that customizes existing pathological speech samples to new speakers using autoencoder-based voice conversion, improving naturalness and speaker characteristic transfer.
Contribution
It presents a new approach that simplifies pathological speech conversion by focusing on speaker change rather than speech degradation, validated with a proof of concept using dysarthric speech.
Findings
Reasonable naturalness for high intelligibility speakers
Successful speaker characteristic conversion for low and high intelligibility speakers
Lower intelligibility speakers show marginal naturalness degradation
Abstract
In this paper, we propose a new approach to pathological speech synthesis. Instead of using healthy speech as a source, we customise an existing pathological speech sample to a new speaker's voice characteristics. This approach alleviates the evaluation problem one normally has when converting typical speech to pathological speech, as in our approach, the voice conversion (VC) model does not need to be optimised for speech degradation but only for the speaker change. This change in the optimisation ensures that any degradation found in naturalness is due to the conversion process and not due to the model exaggerating characteristics of a speech pathology. To show a proof of concept of this method, we convert dysarthric speech using the UASpeech database and an autoencoder-based VC technique. Subjective evaluation results show reasonable naturalness for high intelligibility dysarthric…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
