Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation
Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu,, Helen Meng

TL;DR
This paper introduces an adversarial speaker adaptation method for dysarthric speech reconstruction that improves speaker identity preservation and speech quality, reducing word error rates significantly.
Contribution
It proposes a novel multi-task learning strategy, ASA, that fine-tunes speaker encoders with adversarial training to better preserve speaker identity in reconstructed speech.
Findings
Enhanced speaker similarity in reconstructed speech.
Achieved 22.3% and 31.5% word error rate reduction for different dysarthria severities.
Comparable speech naturalness to baseline methods.
Abstract
Dysarthric speech reconstruction (DSR), which aims to improve the quality of dysarthric speech, remains a challenge, not only because we need to restore the speech to be normal, but also must preserve the speaker's identity. The speaker representation extracted by the speaker encoder (SE) optimized for speaker verification has been explored to control the speaker identity. However, the SE may not be able to fully capture the characteristics of dysarthric speakers that are previously unseen. To address this research problem, we propose a novel multi-task learning strategy, i.e., adversarial speaker adaptation (ASA). The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech Recognition and Synthesis · Voice and Speech Disorders · Speech and Audio Processing
