Improving severity preservation of healthy-to-pathological voice   conversion with global style tokens

Bence Mark Halpern; Wen-Chin Huang; Lester Phillip Violeta; R.J.J.H.; van Son; Tomoki Toda

arXiv:2310.02570·cs.SD·October 5, 2023

Improving severity preservation of healthy-to-pathological voice conversion with global style tokens

Bence Mark Halpern, Wen-Chin Huang, Lester Phillip Violeta, R.J.J.H., van Son, Tomoki Toda

PDF

Open Access

TL;DR

This paper enhances healthy-to-pathological voice conversion by integrating global style tokens and phonetic posteriorgrams, resulting in better severity preservation and more accurate speaker identity transfer, supported by a new parallel dataset and expert listening evaluations.

Contribution

It introduces a novel approach combining GST and PPG for improved severity preservation in voice conversion, along with a new dataset for precise evaluation.

Findings

01

Severity is better preserved with GST and PPG integration.

02

Pathology affects x-vectors but retains some speaker information.

03

Severity labels alone are insufficient for source speaker selection.

Abstract

In healthy-to-pathological voice conversion (H2P-VC), healthy speech is converted into pathological while preserving the identity. The paper improves on previous two-stage approach to H2P-VC where (1) speech is created first with the appropriate severity, (2) then the speaker identity of the voice is converted while preserving the severity of the voice. Specifically, we propose improvements to (2) by using phonetic posteriorgrams (PPG) and global style tokens (GST). Furthermore, we present a new dataset that contains parallel recordings of pathological and healthy speakers with the same identity which allows more precise evaluation. Listening tests by expert listeners show that the framework preserves severity of the source sample, while modelling target speaker's voice. We also show that (a) pathology impacts x-vectors but not all speaker information is lost, (b) choosing source…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVoice and Speech Disorders · Speech Recognition and Synthesis · Dysphagia Assessment and Management