Improving severity preservation of healthy-to-pathological voice conversion with global style tokens
Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R.J.J.H., van Son, Tomoki Toda

TL;DR
This paper enhances healthy-to-pathological voice conversion by integrating global style tokens and phonetic posteriorgrams, resulting in better severity preservation and more accurate speaker identity transfer, supported by a new parallel dataset and expert listening evaluations.
Contribution
It introduces a novel approach combining GST and PPG for improved severity preservation in voice conversion, along with a new dataset for precise evaluation.
Findings
Severity is better preserved with GST and PPG integration.
Pathology affects x-vectors but retains some speaker information.
Severity labels alone are insufficient for source speaker selection.
Abstract
In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2) by using phonetic posteriorgrams (PPG) and global style tokens (GST). Furthermore, we present a new dataset that contains parallel recordings of pathological and healthy speakers with the same identity which allows more precise evaluation. Listening tests by expert listeners show that the framework preserves severity of the source sample, while modelling target speaker's voice. We also show that (a) pathology impacts x-vectors but not all speaker information is lost, (b) choosing source…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Dysphagia Assessment and Management
