Speaker Identity Preservation in Dysarthric Speech Reconstruction by   Adversarial Speaker Adaptation

Disong Wang; Songxiang Liu; Xixin Wu; Hui Lu; Lifa Sun; Xunying Liu,; Helen Meng

arXiv:2202.09082·eess.AS·February 21, 2022

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu,, Helen Meng

PDF

Open Access

TL;DR

This paper introduces an adversarial speaker adaptation method for dysarthric speech reconstruction that improves speaker identity preservation and speech quality, reducing word error rates significantly.

Contribution

It proposes a novel multi-task learning strategy, ASA, that fine-tunes speaker encoders with adversarial training to better preserve speaker identity in reconstructed speech.

Findings

01

Enhanced speaker similarity in reconstructed speech.

02

Achieved 22.3% and 31.5% word error rate reduction for different dysarthria severities.

03

Comparable speech naturalness to baseline methods.

Abstract

Dysarthric speech reconstruction (DSR), which aims to improve the quality of dysarthric speech, remains a challenge, not only because we need to restore the speech to be normal, but also must preserve the speaker's identity. The speaker representation extracted by the speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. However, the SE may not be able to fully capture the characteristics of dysarthric speakers that are previously unseen. To address this research problem, we propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA). The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing