Lightweight and perceptually-guided voice conversion for electro-laryngeal speech
Benedikt Mayrhofer, Franz Pernkopf, Philipp Aichinger, Martin Hagm\"uller

TL;DR
This paper introduces a lightweight, perceptually-guided voice conversion method tailored for electro-laryngeal speech, significantly improving naturalness and intelligibility by combining self-supervised pretraining with supervised fine-tuning.
Contribution
It adapts the StreamVC framework for EL speech by removing pitch and energy modules and integrating perceptual and intelligibility losses, demonstrating effective improvements.
Findings
CER reduced significantly with the proposed model
nMOS score increased from 1.1 to 3.3
Consistent narrowing of gap to healthy speech
Abstract
Electro-laryngeal (EL) speech is characterized by constant pitch, limited prosody, and mechanical noise, reducing naturalness and intelligibility. We propose a lightweight adaptation of the state-of-the-art StreamVC framework to this setting by removing pitch and energy modules and combining self-supervised pretraining with supervised fine-tuning on parallel EL and healthy (HE) speech data, guided by perceptual and intelligibility losses. Objective and subjective evaluations across different loss configurations confirm their influence: the best model variant, based on WavLM features and human-feedback predictions (+WavLM+HF), drastically reduces character error rate (CER) of EL inputs, raises naturalness mean opinion score (nMOS) from 1.1 to 3.3, and consistently narrows the gap to HE ground-truth speech in all evaluated metrics. These findings demonstrate the feasibility of adapting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Stuttering Research and Treatment
